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SET THEORY 


I. Sets and Functions 

1.1. Basic definitions. Mathematics habitually deals with “sets” made up 
of “elements” of various kinds, e.g., the set of faces of a polyhedron, the 
set of points on a line, the set of all positive integers, and so on. Because of 
their generality, it is hard to define these concepts in a way that does more 
than merely replace the word “set” by some equivalent term like “class,” 
“family,” “collection,” etc. and the word “element” by some equivalent 
term like “member.” We will adopt a “naive” point of view and regard the 
notions of a set and the elements of a set as primitive and well-understood. 

The set concept plays a key role in modern mathematics. This is partly 
due to the fact that set theory, originally developed towards the end of the 
nineteenth century, has by now become an extensive subject in its own right. 
More important, however, is the great influence which set theory has exerted 
and continues to exert on mathematical thought as a whole. In this chapter, 
we introduce the basic set-theoretic notions and notation to be used in the 
rest of the book. 

Sets will be denoted by capital letters like A, B, ... , and elements of 

sets by small letters like a,b, -The set with elements a,b,c, . . . is often 

denoted by {a, b, c, . . .}, i.e., by writing the elements of the set between 
curly brackets. For example, {1} is the set whose only member is 1, while 
{1, 2,is the set of all positive integers. The statement “the 
element a belongs to the set A” is written symbolically as aeA, while 
a<£A means that “the element a does not belong to the set A.” If every 
element of a set A also belongs to a set B, we say that A is a subset of the 
set B and write A <= B or B => A (also read as “A is contained in B" or 
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“B contains A”). For example, the set of all even numbers is a subset of the 
set of all real numbers. We say that two sets A and B are equal and write 
A = B if A and B consist of precisely the same elements. Note that A = B 
if and only if A <= B and B <= A, i.e., if and only if every element of A is an 
element of B and every element of B is an element of A. If A e B but A ^ B, 
we call A a proper subset of B. 

Sometimes it is not known in advance whether or not a certain set (for 
example, the set of roots of a given equation) contains any elements at all. 
Thus it is convenient to introduce the concept of the empty set, i.e., the set 
containing no elements at all. This set will be denoted by the symbol 0. 
The set 0 is clearly a subset of every set (why?). 

A U B A n B 


A B A B 

Figure 1 Figure 2 

1.2. Operations on sets. Let A and B be any two sets. Then by the sum 
or union of A and B, denoted by A U B, is meant the set consisting of all 
elements which belong to at least one of the sets A and B (see Figure 1). 
More generally, by the sum or union of an arbitrary number (finite or in¬ 
finite) of sets A, x (indexed by some parameter a), we mean the set, denoted by 

U Aa, 

a 

of all elements belonging to at least one of the sets A tt . 

By the intersection A n B of two given sets A and B, we mean the set 
consisting of all elements which belong to both A and B (see Figure 2). For 
example, the intersection of the set of all even numbers and the set of all 
integers divisible by 3 is the set of all integers divisible by 6. By the inter¬ 
section of an arbitrary number (finite or infinite) of sets A a , we mean the 
set, denoted by 

n Aoc, 

a 

of all elements belonging to every one of the sets A x . Two sets A and B are 
said to be disjoint if A n B = 0, i.e., if they have no elements in common. 
More generally, let F be a family of sets such that A n B = 0 for every 
pair of sets A, B in F. Then the sets in F are said to be pairwise disjoint. 


It is an immediate consequence of the above definitions that the operations 
U and n are commutative and associative, i.e., that 

A U B = B U A, (A UB) U C = A U (B UC), 

A n B = B n A, (A n B) n c = A n {B n C). 

Moreover, the operations U and n obey the following distributive laws: 

(A u B) n c = (A n c) u (B n C), (1) 

(A n B) U C = (A u Cj n (B u C). (2) 

For example, suppose x e (A U B) n C, so that x belongs to the left-hand 

A-B A h B 


A B A B 

Figure 3 Figure 4 

side of (1). Then x belongs to both C and A U B, i.e., x belongs to both 
C and at least one of the sets A and B. But then x belongs to at least one of 
the sets A n C and B D C, i.e., x e (A n C) U (B n C), so that x belongs 
to the right-hand side of (1). Conversely, suppose x e (A n C) U (B n C). 
Then x belongs to at least one of the two sets A n C and B n C. It follows 
that x belongs to both C and at least one of the two sets A and B, i.e., x e C 
and x e A U B or equivalently x e (A U B) n C. This proves (1), and (2) is 
proved similarly. 

By the dijference A — B between two sets A and B (in that order), we 
mean the set of all elements of A which do not belong to B (see Figure 3). 
Note that it is not assumed that A => B. It is sometimes convenient (e.g., in 
measure theory) to consider the symmetric dijference of two sets A and B, 
denoted by A A B and defined as the union of the two differences A — B 
and B — A (see Figure 4): 

A A B = (A - B) U (B - A). 

We will often be concerned later with various sets which are all subsets 
of some underlying basic set R, for example, various sets of points on the 
real line. In this case, given a set A, the difference R — A is called the 
complement of A, denoted by CA. 
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An important role is played in set theory and its applications by the 
following “duality principle”: 

R- U A a = n (R - AJ, (3) 

a a 

R - n A a = U (R - AJ. (4) 

...V" a a 

In words, the complement of a union equals the intersection of the comple¬ 
ments, and the complement of an intersection equals the union of the 
complements. According to the duality principle, any theorem involving a 
family of subsets of a fixed set R can be converted automatically into another, 
“dual” theorem by replacing all subsets by their complements, all unions 
by intersections and all intersections by unions. To prove (3), suppose 

xeR— U A a . (5) 

a 

Then x does not belong to the union 

U A a , (6) 

a 

i.e., x does not belong to any of the sets A a . It follows that x belongs to each 
of the complements R — A a , and hence 

x e fl (R — AJ. (7) 

a 

Conversely, suppose (7) holds, so that x belongs to every set R — A a . Then 
x does not belong to any of the sets A x , i.e., x does not belong to the union 
(6), or equivalently (5) holds. This proves (3), and (4) is proved similarly 
(give the details). 

Remark. The designation “symmetric difference” for the set A A £ is 
not too apt, since A A B has much in common with the sum A U B. In fact, 
in A U £ the two statements “x belongs to A” and “x belongs to B” are 
joined by the conjunction “or” used in the “either ... or ... or both . . 
sense, while in A A B the same two statements are joined by “or” used in the 
ordinary “either . . . or . . .’’sense (as in “to be or not to be”). In other words, 
x belongs to A U B if and only if x belongs to either A or B or both, while x 
belongs to A A B if and only if x belongs to either A or B but not both. The 
set A A B can be regarded as a kind of “modulo-two sum” of the sets A and 
B, i.e., a sum of the sets A and B in which elements are dropped if they are 
counted twice (once in A and once in B). 

1.3. Functions and mappings. Images and preimages. A rule associating a 
unique real number y —f (x) with each element of a set of real numbers X 
is said to define a (real) function f on X. The set X is called the domain 
(of definition) of /, and the set Y of all numbers / (x) such that xelis called 
the range of/. 


More generally, let M and TV be two arbitrary sets. Then a rule associating 
a unique element b=f(d)eN with each element ae Mis again said to define 
a function f on M (or a function / with domain M). In this more general 
context,/is usually called a mapping of M into N. By the same token,/is 
said to map M into N (and a into b). 

If a is an element of M, the corresponding element b —f (a) is called the 
image of a (under the mapping /). Every element of M with a given element 
b e N as its image is called a preimage of b. Note that in general b may have 
several preimages. Moreover, N may contain elements with no preimages 
at all. If b has a unique preimage, we denote this preimage by/ _1 (Z>). 

If A is a subset of M, the set of all elements f(a) e N such that ae A 
is called the image of A, denoted by f(A). The set of all elements of M whose 
images belong to a given set B <= N is called the preimage of B, denoted 
by f~ l (B). If no element of B has a preimage, then f~\S) = 0. A function 
/ is said to map M into N if f(M) c N, as is always the case, and onto N 
if f(M) = N. 1 Thus every “onto mapping” is an “into mapping,” but not 
conversely. 

Suppose/maps M onto N. Then/is said to be one-to-one if each element 
b e N has a unique preimage f~\b). In this case, / is said to establish a 
one-to-one correspondence between M and N, and the mapping / _1 associ¬ 
ating f ~ l (b) with each b e N is called the inverse off. 

Theorem 1 . The preimage of the union of two sets is the union of the 
preimages of the sets: 

f-HA U B) =f-\A) yjf-fiE). 

Proof. If x ef~\A u B), then f(x)e A u B, so that /(x) belongs 
to at least one of the sets A and B. But then x belongs to at least one of 
the sets f~ x (A) and f~ x (B), i.e., x e f~ x (A) U f~ l (B). 

Conversely, if xef~ x (A) U f~ l (B), then x belongs to at least one 
of the sets f~ x (A) and f~ x (B). Therefore/(x) belongs to at least one of 
the sets A and B, i.e.,/(x) e A U B. But then x e f~ x (A U B). I 2 

Theorem 2. The preimage of the intersection of two sets is the inter¬ 
section of the preimages of the sets: 

f-fiA n B) —f~ x (A) nf-fB). 

Proof. If x e f~\A n B), then f(x)eA n B, so that /(x) e A and 
f(x) e B. But then x e f~ x (A) and x ef~ 1 (B), i.e., x 6 f~ l (A) n f^(B). 

Conversely, if x e /^(A) n f-\B), then x e/ _1 (^4) and x £ f~ l (B). 
Therefore f(x)eA and f(x)eB, i.e., f(x)e A n B. But then x£ 
f-HA n B). 1 


1 As in the case of real functions, the set f(M) is called the range of/. 

2 The symbol 1 stands for Q.E.D. and indicates the end of a proof. 
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Theorem 3. The image of the union of two sets equals the union of the 
images of the sets: 

/w U B) = f{A) KJfiB). 

Proof. If y ef{AvJ B), then y = f (x) where x belongs to at least one 
of the sets A and B. Therefore y =f (x) belongs to at least one of the sets 
f(A) and f{B), i.e., y ef{A) u f{B). 

Conversely, if y ef(A) Uf(B), then y = f(x) where x belongs to at 
least one of the sets A and B, i.e., x e A U B and hence y — f{x) e 
f(A u B). 1 

Remark 1. Surprisingly enough, the image of the intersection of two sets 
does not necessarily equal the intersection of the images of the sets. For 
example, suppose the mapping / projects the xy- plane onto the x-axis, 
carrying the point (x,y) into the (x, 0). Then the segments 0 < x< 1, 
y = 0 and 0 < x < 1 , y = 1 do not intersect, although their images coincide. 

Remark 2. Theorems 1-3 continue to hold for unions and intersections 
of an arbitrary number (finite or infinite) of sets A a : 

f-‘(y -«.) - y/-w. 

/-(n^) = n/-‘ W .). 

= y f{A«). 

1.4. Decomposition of a set into classes. Equivalence relations. Decom" 
positions of a given set into pairwise disjoint subsets play an important role 
in a great variety of problems. For example, the plane (regarded as a point 
set) can be decomposed into lines parallel to the x-axis, three-dimensional 
space can be decomposed into concentric spheres, the inhabitants of a given 
city can be decomposed into different age groups, and so on. Any such 
representation of a given set M as the union of a family of pairwise disjoint 
subsets of M is called a decomposition or partition of M into classes. 

A decomposition is usually made on the basis of some criterion, allowing 
us to assign the elements of M to one class or another. For example, the 
set of all triangles in the plane can be decomposed into classes of congruent 
triangles or into classes of triangles of equal area, the set of all functions 
of x can be decomposed into classes of functions all taking the same value at 
a given point x, and so on. Despite the great variety of such criteria, they 
are not completely arbitrary. For example, it is obviously impossible to 
partition all real numbers into classes by assigning the number b to the same 
class as the number a if and only if b > a. In fact, if b > a, b must be 


sec. l 

assigned to the same class as a, but then a cannot be assigned to the same 
class as b, since a < b. Moreover, since a is not greater than itself, a cannot 
even be assigned to the class containing itself! As another example, it is 
impossible to partition the points of the plane into classes by assigning two 
points to the same class if and only if the distance between them is less than 1 . 
In fact, if the distance between a and b is less than 1 and if the distance 
between b and c is less than 1 , it does not follow that the distance between 
a and c is less than 1. Thus, by assigning a to the same class as b and b to 
the same class as c, we may well find that two points fall in the same class 
even though the distance between them is greater than 1 ! 

These examples suggest conditions which must be satisfied by any criterion 
if it is to be used as the basis for partitioning a given set into classes. Let 
M be a set, and let certain ordered pairs {a, b) of elements of M be called 
“labelled.” If ( a , b) is a labelled pair, we say that a is related to b by the 
{binary) relation R and write aRb. 3 For example, if a and b are real numbers, 
aRb might mean a < b, while if a and b are triangles, aRb might mean that 
a and b have the same area. A relation between elements of M is called 
a relation on M if there is at least one labelled pair {a, b) for every a e M. 
A relation R on M is called an equivalence relation (on M) if it satisfies the 
following three conditions: 

1) Refiexivity: aRa for every a e M; 

2) Symmetry: If aRb, then bRa; 

3) Transitivity: If aRb and bRc, then aRc. 

Theorem 4. A set M can be partitioned into classes by a relation R 
{acting as a criterion for assigning two elements to the same class ) if and 
only if R is an equivalence relation on M. 

Proof. Every partition of M determines a binary relation on M, where 
aRb means that “« belongs to the same class as b.” It is then obvious 
that R must be reflexive, symmetric and transitive, i.e., that R is an 
equivalence relation on M. 

Conversely, let R be an equivalence relation on M, and let K„ be the 
set of all elements x e M such that xRa (clearly a e K a , since R is 
reflexive). Then two classes K a and K b are either identical or disjoint. 

In fact, suppose an element c belongs to both K a and K b , so that cRa 
and cRb. Then aRc by the symmetry, and hence 

aRb (8) 

3 Put somewhat differently, let M‘ be the set of all ordered pairs (a, b) with a, be M, 
and let St be the subset of M 2 consisting of all labelled pairs. Then aRb if and only if 
{a, b)E St, i.e., a binary relation is essentially just a subset of M 2 . As an exercise, state 
the three conditions for R to be an equivalence relation in terms of ordered pairs and the 
set St. 
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by the transitivity. If now x e K a , then xRa and hence xRb by ( 8 ) and the 
transitivity, i.e., x e K b . Virtually the same argument shows that x e K b 
implies x e K a . Therefore K a = K h if K a and K b have an element in 
common. Therefore the distinct sets K a form a partition of M into 
classes, g 

Remark. Because of Theorem 4, one often talks about the decomposition 
of M into equivalence classes. 

There is an intimate connection between mappings and partitions into 
classes, as shown by the following examples: 

Example 1. Let / be a mapping of a set A into a set B and partition A 
into sets, each consisting of all elements with the same image b = f (a) e B. 
This gives a partition of A into classes. For example, suppose / projects 
the xj-plane onto the x-axis, by mapping the point (x, y) into the point 
(x, 0). Then the preimages of the points of the x-axis are vertical lines, and 
the representation of the plane as the union of these lines is the decomposition 
into classes corresponding to /. 

Example 2. Given any partition of a set A into classes, let B be the set of 
these classes and associate each element a eA with the class (i.e., element 
of B) to which it belongs. This gives a mapping of A into B. For example, 
suppose we partition three-dimensional space into classes by assigning to the 
same class all points which are equidistant from the origin of coordinates. 
Then every class is a sphere of a certain radius. The set of all these classes 
can be identified with the set of points on the half-line [ 0 , oo), each point 
corresponding to a possible value of the radius. In this sense, the decom¬ 
position of space into concentric spheres corresponds to the mapping of 
space into the half-line [ 0 , co). 

Example 3. Suppose we assign all real numbers with the same fractional 
part 4 to the same class. Then the mapping corresponding to this partition 
has the effect of “winding” the real line onto a circle of unit circumference. 

Problem 1. Prove that if A U B = A and A O B = A, then A = B. 

Problem 2. Show that in general (A — B) U B ^ A. 

Problem 3. Let A = {2, 4,... , 2 n, . . .} and B = {3, 6,..., 3«,.. .}. 
Find A n B and A — B. 
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Problem 4. Prove that 


a) (A - B) O C = (A n C) - (B n C); 

b) A AB = (A u 5) - (A nB). 

Problem 5. Prove that 

U4-U£ a cU(4- B a ). 

a a a 

Problem 6. Let A n be the set of all positive integers divisible by n. Find 
the sets 


CO CO 

a) UA n ; b) r)A n . 

71 =2 71 =2 

Problem 7. Find 

OC 

a) U 

71= 1 


. 1 , 1 

a -j— ,o- 

n n 



Problem 8. Let A a be the set of points lying on the curve 


What is 



(0 < x < co). 


n aj 


Problem 9. Let y—f(x) = (x) for all real x, where (x) is the fractional 
part of x. Prove that every closed interval of length 1 has the same image 
under /. What is this image ? Is / one-to-one ? What is the preimage of the 
interval \ < y < |? Partition the real line into classes of points with the 
same image. 

Problem 10. Given a set M, let : : /t, be the set of all ordered pairs on the 
form (a, a) with a e M, and let aRb if and only if (a, b ) e S/t. Interpret the 
relation R. 


Problem 11. Give an example of a binary relation which is 

a) Reflexive and symmetric, but not transitive; 

b) Reflexive, but neither symmetric nor transitive; 

c) Symmetric, but neither reflexive nor transitive; 

d) Transitive, but neither reflexive nor symmetric. 


2 . Equivalence of Sets. The Power of a Set 


4 The largest integer <x is called the integralpart of x, denoted by [x], and the quantity 
x — [x] is called the fractional part of x. 


2.1. Finite and infinite sets. The set of all vertices of a given polyhedron, 
the set of all prime numbers less than a given number, and the set of all 
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residents of New York City (at a given time) have a certain property in 
common, namely, each set has a definite number of elements which can be 
found in principle, if not in..,practice. Accordingly, these sets are all said to 
be finite. Clearly, we can be sure that a set is finite without knowing the 
number of elements in it. On the other hand, the set of all positive integers, 
the set of all points on the line, the set of all circles in the plane, and the 
set of all polynomials with rational coefficients have a different property 
in common, namely, if we remove one element from each set, then remove 
two elements, three elements, and so on, there will still be elements left in 
the set at each stage. Accordingly, sets of this kind are said to be infinite. 

Given two finite sets, we can always decide whether or not they have the 
same number of elements, and if not, we can always determine which set 
has more elements than the other. It is natural to ask whether the same is 
true of infinite sets. In other words, does it make sense to ask, for example, 
whether there are more circles in the plane than rational points on the line, 
or more functions defined in the interval [0, 1] than lines in space? As will 
soon be apparent, questions of this kind can indeed be answered. 

To compare two finite sets A and B, we can count the number of elements 
in each set and then compare the two numbers, but alternatively, we can try 
to establish a one-to-one correspondence between (the elements of) A and B, 
i.e., a correspondence such that each element in A corresponds to one and 
only one element in B and vice verse. It is clear that a one-to-one corre¬ 
spondence between two finite sets can be set up if and only if the two sets 
have the same number of elements. For example, to ascertain whether or 
not the number of students in an assembly is the same as the number of 
seats in the auditorium, there is no need to count the number of students 
and the number of seats. We need merely observe whether or not there are 
empty seats or students with no place to sit down. If the students can all 
be seated with no empty seats left, i.e., if there is a one-to-one correspondence 
between the set of students and the set of seats, then these two sets obviously 
have the same number of elements. The important point here is that the 
first method (counting elements) works only for finite sets, while the second 
method (setting up a one-to-one correspondence) works for infinite sets as 
well as for finite sets. 

2.2. Countable sets. The simplest infinite set is the set Z + of all positive 
integers. An infinite set is called countable if its elements can be put in one-to- 
one correspondence with those of Z + . In other words, a countable set is a 

set whose elements can be numbered a u a 2 ,. . . , a n .By an uncountable 

set we mean, of course, an infinite set which is not countable. 

We now give some examples of countable sets: 

Example 1, The set Z of all integers, positive, negative or zero, is 
countable. In fact, we can set up the following one-to-one correspondence 
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between Z and the set Z + of all positive integers: 

0 , - 1 , 1 , - 2 , 2 ,... 

1, 2, 3, 4, 5,... 

More explicitly, we associate the nonnegative integer n > 0 with the odd 
number In .+ 1 , and the negative integer n < 0 with the even number 2 \n\, 
i.e., 

n <—> 2n + 1 if n > 0 , 
n *->2\n\ if n < 0 

(the symbol <-» denotes a one-to-one correspondence). 

Example 2. The set of all positive even numbers is countable, as shown 
by the obvious correspondence n 2 «. 

Example 3. The set 2, 4, 8, . . ., 2", . . . of powers of 2 is countable, as 
shown by the obvious correspondence n <-» 2 ”. 

Example 4. The set Q of all rational numbers is countable. To see this, 
we first note that every rational number a can be written as a fraction pjq, 
q > 0 in lowest terms with a positive denominator. Call the sum \p\ + q the 
“height” of the rational number a. For example, 



1 


is the only rational number of height 0 , 

-1 1 

1 ’ 1 

are the only rational numbers of height 2 , 

-2 -1 1 2 

1 ’ 2 ’ 2 ’ 1 

are the only rational numbers of height 3, and so on. We can now arrange 
all rational numbers in order of increasing height (with the numerators 
increasing in each set of rational numbers of the same height). In other 
words, we first count the rational numbers of height 1 , then those of height 
2 (suitably arranged), those of height 3, and so on. In this way, we assign 
every rational number a unique positive integer, i.e., we set up a one-to-one 
correspondence between the set Q of all rational numbers and the set Z + 
of all positive integers. 

Next we prove some elementary theorems involving countable sets: 

Theorem 1. Every subset of a countable set is countable. 

Proof. Let A be countable, with elements a y , a 2 , . . . , and let B be a 
subset of A. Among the elements a y , a 2 , . .. , let a ni , a n% , ... be those in 
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the set B. If the set of numbers n u « 2 ,. . . has a largest number, then 
B is finite. Otherwise B is countable (consider the correspondence 
i <-> a n ). 1 0. 

Theorem 2. The union of a finite or countable number of countable 
sets A x , A 2 , . . . is itself countable. 

Proof. We can assume that no two of the sets A u A 2 , . .. have 
elements in common, since otherwise we could consider the sets 

^i) A a (A x U A 2 ), . . . 

instead, which are countable by Theorem 1 and have the same union as 
the original sets. Suppose we write the elements of A x , A 2 , ... in the 
form of an infinite table 


a ll a 12 a vi a u • 

ei<ii a 22 u.>3 a. j. 

a 3\ a 'i’i a :n a 34 ■ 

a41 a \2 a 43 a U ■ 


where the elements of the set A x appear in the first row, the elements of 
the set A 2 appear in the second row, and so on. We now count all the 
elements in (1) “diagonally,” i.e., first we choose a n , then a 12 , then a 21 , 
and so on, moving in the way shown in the following table: 5 


a n 

-> 

a 12 

a 13 a 14 



z 

Z 


a 2l 


a 22 

a 23 

a 2i 

1 


z 



a 31 


°Z2 

a 33 

a 34 

z 




a 41 


a i2 

a l3 

a u 


It is clear that this procedure associates a unique number to each element 
in each of the sets A u A 2 ,. . . , thereby establishing a one-to-one 
correspondence between the union of the sets A u A 2 , ■ ■ ■ and the set 
Z + of all positive integers. | 

Theorem 3. Every infinite set has a countable subset. 


5 Discuss the obvious modifications of (1) and (2) in the case of only a finite number 
of sets A u A 2 , . . . . 
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Proof. Let M be an infinite set and a, any element of M. Being in¬ 
finite, M contains an element a 2 distinct from a s , an element a 3 distinct 
from both a x and a 2 , and so on. Continuing this process (which can 
never terminate due to a “shortage” of elements, since M is infinite), 
we get a countable subset 

A = {a x , a 2 , . .} 

of the set M. ( 

Remark. Theorem 3 shows that countable sets are the “smallest” infinite 
sets. The question of whether there exist uncountable (infinite) sets will be 
considered below. 

2.3. Equivalence of sets. We arrived at the notion of a countable set M 
by considering one-to-one correspondences between M and the set Z + of all 
positive integers. More generally, we can consider one-to-one correspondences 
between any two sets M and N: 

Definition. Two sets M and N are said to be equivalent (written 
M ~ N) if there is a one-to-one correspondence between the elements of 
M and the elements of N. 

The concept of equivalence 6 is applicable to both finite and infinite sets. 
Two finite sets are equivalent if and only if they have the same number of 
elements. We can now define a countable set as a set equivalent to the set 
Z + of all positive integers. It is clear that two sets which are equivalent to a 
third set are equivalent to each other, and in particular that any two countable 
sets are equivalent. 

Example 1. The sets of points in any two 
closed intervals [a, b] and [c, d] are equiv¬ 
alent, and Figure 5 shows how to set up a 
one-to-one correspondence between them. 

Here two points p and q correspond to each 
other if and only if they lie on the same ray 
emanating from the point O in which the 
extensions of the line segments ac and bd 
intersect. 

Example 2. The set of all points z in the 
complex plane is equivalent to the set of all 

6 Not to be confused with our previous use of the word in the phrase “equivalence 
relation."’ However, note that set equivalence is an equivalence relation in the sense of 
Sec. 1.4, being obviously reflexive, symmetric and transitive. Hence any family of sets 
can be partitioned into classes of equivalent sets. 


0 



Figure 5 
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points a on a sphere. In fact, a one-to- 
one correspondence z <-> a between the 
points of the two sets can be established 
by using stereographic projection, as 
shown in Figure 6 ( O is the north pole 
of the sphere). 

Example 3. The set of all points x 
in the open unit interval ( 0 , 1 ) is equiv¬ 
alent to the set of all points y on the 
whole real line. For example, the formula 

1 ,1 

y = — arc tan x j— 

7T 2 

establishes a one-to-one correspondence between these two sets. 

The last example and the examples in Sec. 2.2 show that an infinite set 
is sometimes equivalent to one of its proper subsets. For example, there are 
“as many” positive integers as integers of arbitrary sign, there are “as many” 
points in the interval (0, 1) as on the whole real line, and so on. This fact 
is characteristic of all infinite sets (and can be used to define such sets), as 
shown by 

Theorem 4. Every infinite set is equivalent to one of its proper subsets. 

Proof. According to Theorem 3, any infinite set M contains a 
countable subset. Let this subset be 

A = {aj, a 2 , • •}, 

and partition A into two countable subsets 

A 1 = {a x , a 3 , n 5 , . . .}, A 2 = {a 2 , a x , a 6 , . . .}. 

Obviously, we can establish a one-to-one correspondence between the 
countable sets A and A 1 (merely let a n <-» This correspondence 

can be extended to a one-to-one correspondence between the sets A u 
(M — A) = M and A x U [M — A) = M — A 2 by simply assigning x 
itself to each element x £ M — A. But M — A 2 is a proper subset of 
M. 1 

2.4. Uncountability of the real numbers. Several examples of countable 
sets were given in Sec. 2.2, and many more examples of such sets could be 
given. In fact, according to Theorem 2, the union of a finite or countable 
number of countable sets is itself countable. It is now natural to ask whether 
there exist infinite sets which are uncountable. The existence of such sets 
is shown by 


SEC. 2 

Theorem 5. The set of real numbers in the closed unit interval [0, 1] is 
uncountable. 

Proof. Suppose we have somehow managed to count some or all of 
the real numbers in [ 0 , 1 ], arranging them in a list 

a l = ^■ a ll a 12 ■ ■ ■ a ln ■ • • > 

a 2 = 0.a 31 «22 • • • n ■ ■ ■ > 


where a ik is the Mi digit in the decimal expansion of the number cq. 
Consider the decimal 

P = 0.7>A. . . b n . . . (4) 

constructed as follows: For b x choose any digit (from 0 to 9) different 
from a u , for b 2 any digit different from a 82 , and so on, and in general 
for b n any digit different from a nn . Then the decimal (4) cannot coincide 
with any decimal in the list (3). In fact, (3 differs from a x in at least the 
first digit, from a 2 in at least the second digit, and so on, since in general 
b ,, ^ a nn for all n. Thus no list of real numbers in the interval [0,1] 
can include all the real numbers in [ 0 , 1 ], 

The above argument must be refined slightly since certain numbers, 
namely those of the form pjW, can be written as decimals in two ways, 
either with an infinite run of zeros or an infinite run of nines. For 
example, 

i = fio= 0.5000 . . . = 0.4999 

so that the fact that two decimals are distinct does not necessarily mean 
that they represent distinct real numbers. However, this difficulty 
disappears if in constructing [3, we require that (3 contain neither zeros 
nor nines, for example by setting b n = 2 if a nn = 1 and b n = 1 if 

«««#!• I 

Thus the set [0, 1] is uncountable. Other examples of uncountable sets 
equivalent to [ 0 , 1 ] are 

1) The set of points in any closed interval [a, b]\ 

2) The set of points on the real line; 

3) The set of points in any open interval {a, b ); 

4) The set of all points in the plane or in space; 

5) The set of all points on a sphere or inside a sphere; 

6 ) The set of all lines in the plane; 

7) The set of all continuous real functions of one or several variables. 


0 
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The fact that the sets 1) and 2) are equivalent to [0, 1 ] is proved as in Examples 
1 and 3, pp. 13 and 14, while the fact that the sets 3)-7) are equivalent 
to [0, 1] is best proved indirectly (cf. Problems 7 and 9). 

2.5. The power of a set. Given any two sets M and N, suppose M and N 
are equivalent. Then M and N are said to have the same power. Roughly 
speaking, “power” is something shared by equivalent sets. If M and N are 
finite, then M and N have the same number of elements, and the concept 
of the power of a set reduces to the usual notion of the number of elements 
in a set. The power of the set Z + of all positive integers, and hence the power 
of any countable set, is denoted by the symbol K 0 , read “aleph null.” A 
set equivalent to the set of real numbers in the interval [ 0 , 1 ], and hence to 
the set of all real numbers, is said to have the power of the continuum , 
denoted by c (or often by X). 

For the powers of finite sets, i.e., for the positive integers, we have the 
notions of “greater than” and “less than,” as well as the notion of equality. 
We now show how these concepts are extended to the case of infinite sets. 

Let A and B be any two sets, with powers m(A) and m(B), respectively. 
If A is equivalent to B, then m(A) = m{B ) by definition. If A is equivalent 
to a subset of B and if no subset of A is equivalent to B, then, by analogy 
with the finite case, it is natural to regard m(A) as less than m{B) or m(B ) as 
greater than m(A). Logically, however, there are two further possibilities: 

a) B has a subset equivalent to A, and A has a subset equivalent to B; 

b) A and B are not equivalent, and neither has a subset equivalent to the 
other. 

In case a), A and B are equivalent and hence have the same power, as shown 
by the Cantor-Bernstein theorem (Theorem 7 below). Case b) would obvi¬ 
ously show the existence of powers that cannot be compared, but it follows 
from the well-ordering theorem (see Sec. 3.7) that this case is actually impos¬ 
sible. Therefore, taking both of these theorems on faith, we see that any two 
sets A and B either have the same power or else satisfy one of the rela¬ 
tions m(A) < m(B ) or m(A) > m(B). For example, it is clear that X 0 < c 
(why?). 

Remark. The very deep problem of the existence of powers between K 0 
and c is touched upon in Sec. 3.9. As a rule, however, the infinite sets 
encountered in analysis are either countable or else have the power of the 
continuum. 

We have already noted that countable sets are the “smallest” infinite 
sets. It has also been shown that there are infinite sets of power greater 
than that of a countable set, namely sets with the power of the continuum. 
It is natural to ask whether there are infinite sets of power greater than that 


of the continuum or, more generally, whether there is a “largest” power. 
These questions are answered by . 

Theorem 6. Given any set M, let.Xt be the set whose elements are all 
possible subsets of M. Then the power of is greater than the power of 
the original set M. 

Proof. Clearly, the power q of the set cannot be less than the power 
m of the original set M, since the “single-element subsets” (or “single- 
tons”) of M form a subset of equivalent to M. Thus we need only 
show that m. and p do not coincide. Suppose a one-to-one correspondence 
a<~s A, h<—> B,. . . 

has been established between the elements a, b, ... of M and certain 
elements A, B, .. . of J( (i.e., certain subsets of M). Then A, B, . . . 
do not exhaust all the elements of^f, i.e., all the subsets of M. To see 
this, let X be the set of elements of M which do not belong to their 
“associated subsets.” More exactly, if A we assign a to Z if a $ A, 
but not if a e A. Clearly, X is a subset of M and hence an element of . 
Suppose there is an element x e M such that x <—> X, and consider 
whether or not x belongs to X. Suppose xfl. Then x e X, since, by 
definition, X contains every element not contained in its associated 
subset. On the other hand, suppose x £ X. Then x e X, since X con¬ 
sists precisely of those elements which do not belong to their associated 
subsets. In any event, the element x corresponding to the subset X must 
simultaneously belong to X and not belong to X. But this is impossible! 

It follows that there is no such element x. Therefore no one-to-one cor¬ 
respondence can be established between the sets M and J (, i.e., 
m i=- p. 1 

Thus, given any set M, there is a set of larger power, a set of 
still larger power, and so on indefinitely. In particular, there is no set of 
“largest” power. 

2.6. The Cantor-Bernstein theorem. Next we prove an important theorem 
already used in the preceding section: 

Theorem 7 ( Cantor-Bernstein ). Given any two sets A and B, suppose 
A contains a subset A 1 equivalent to B, while B contains a subset B 1 
equivalent to A. Then A and B are equivalent. 

Proof. By hypothesis, there is a one-to-one function / mapping A 
into B 1 and a one-to-one function g mapping B into A l : 

f(A) = Bi<= B, g(B ) = A ± <= A. 

Therefore 

A 2 = gf (A) = gif (A)) = g(B,) 
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is a subset of A x equivalent to all of A. Similarly, 

B 2 =f(A l ) 

is a subset of B x equivalent to B. Let A 3 be the subset of A into which 
the mapping gf carries the set A x , and let A x be the subset of A into which 
gf carries A 2 . More generally, let A k+2 be the set into which A k (k = 

1, 2, . ..) is carried by gf. Then clearly 

A k+1 => • • • 

Setting 

00 

D = r\A k , 

Jc= 1 

we can represent A as the following union of pairwise disjoint sets: 

A = (A-A 1 ) U (A x - A 2 ) u (A 2 - A 3 ) U • • • 

U (A k - A k+1 ) U ■ • ■ U D. (5) 

Similarly, we can write A x in the form 
Aj = (A t — A 2 ) U (A 2 — A 3 ) U • • U ( A k — A k+[ ) U • • • U D. ( 6 ) 

Clearly, (5) and ( 6 ) can be rewritten as 

A = D U M U N, (5') 

A x = D u M U N lt (6') 

where 

M = (A 1 - A 2 ) U (A s - A 4 ) U • • • , 

N = (A - AJ U (A t - A 3 ) u • • • , 

A^i = (A 2 A 3 ) U (A 4 A 6 ) U • • • . 

But A — Aj is equivalent to — A 3 (the former is carried into the latter 
by the one-to-one function gf), A 2 — A 3 is equivalent to A x — A b , and 
so on. Therefore N is equivalent to N v It follows from the represen¬ 
tations (5') and ( 6 ') that a one-to-one correspondence can be set up 
between the sets A and A x . But A x is equivalent to B : by hypothesis. 
Therefore A is equivalent to B. g 

Remark. Here we can even “afford the unnecessary luxury” of explicitly 
writing down a one-to-one function carrying A into B, i.e., 

ig^{a) if a e D U M, 

?(«) = 

\f{a) if a e D U N 



Problem 1. Prove that a set with an uncountable subset is itself un¬ 
countable. 

Problem 2. Let M be any infinite set and A any countable set. Prove that 
M ~ M U A. 

Problem 3. Prove that each of the following sets is countable: 

a) The set of all numbers with two distinct decimal expansions (like 
0.5000. . . and 0.4999 . . .); 

b) Thfe set of all rational points in the plane (i.e., points with rational 
coordinates); 

c) The set of all rational intervals (i.e., intervals with rational end points); 

d) The set of all polynomials with rational coefficients. 

Problem 4. A number a is called algebraic if it is a root of a polynomial 
equation with rational coefficients. Prove that the set of all algebraic numbers 
is countable. 

Problem 5. Prove the existence of uncountably many transcendental num¬ 
bers, i.e., numbers which are not algebraic. 

Hint. Use Theorems 2 and 5. 

Problem 6. Prove that the set of all real functions (more generally, 
functions taking values in a set containing at least two elements) defined 
on a set M is of power greater than the power of M. In particular, prove 
that the power of the set of all real functions (continuous and discontinuous) 
defined in the interval [ 0 , 1 ] is greater than c. 

Hint. Use the fact that the set of all characteristic functions (i.e., functions 
taking only the values 0 and 1) on M is equivalent to the set of all subsets 
of M. 

Problem 7. Give an indirect proof of the equivalence of the closed interval 
[a, b], the open interval (a, b) and the half-open interval [a, b) or (a, b]. 

Hint. Use Theorem 7. 


(see Figure 7). 
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Problem 8. Prove that the union of a finite or countable number of sets 
each of power c is itself of power c. 

Problem 9. Prove that each of the following sets has the power of the 
continuum: 

a) The set of all infinite sequences of positive integers; 

b) The set of all ordered n-tuples of real numbers; 

c) The set of all infinite sequences of real numbers. 

Problem 10. Develop a contradiction inherent in the notion of the “set 
of all sets which are not members of themselves.” 

Hint. Is this set a member of itself? 

Comment. Thus we will be careful to avoid sets which are “too big,” like 
the “set of all sets.” 

3. Ordered Sets and Ordinal Numbers 

3.1. Partially ordered sets. A binary relation R on a set M is said to be a 
partial ordering (and the set M itself is said to be partially ordered) if 

1) R is reflexive (aRa for every ae M); 

2) R is transitive (aRb and bRc together imply aRc); 

3) R is antisymmetric in the sense that aRb and bRa together imply a = b. 

For example, if M is the set of all real numbers and aRb means a < b, then 
R is a partial ordering. This suggests writing a < b (or equivalently b > a) 
instead of aRb whenever R is a partial ordering, and we will do so from now 
on. Similarly, we write a < b if a < b, a # b and b > a if b > a, b ^ a. 

The following examples give some idea of the generality of the concept 
of a partial ordering: 

Example 1. Any set M can be partially ordered in a trivial way by setting 
a < b if and only if a = b. 

Example 2. Let M be the set of all continuous functions f g, . . . defined 
in a closed interval [a, [3]. Then we get a partial ordering by setting / < g 
if and only if /( t) < g(t) for every t e [a, (3], 

Example 3. The set of all subsets M u M 2 , ... is partially ordered if 
M\ < M 2 means that M x <= M 2 . 

Example 4. The set of all integers greater than 1 is partially ordered if 
a < b means that “b is divisible by a.” 


An element a of a partially ordered set is said to be maxima! if a < b 
implies b = a and minimal if b < a implies b = a. Thus in Example 4 every 
prime number (greater than 1 ) is a minimal element. 

3.2. Order-preserving mappings. Isomorphisms. Let M and M' be any 
two partially ordered sets, and let/be a one-to-one mapping of M onto M'. 
Then/is said to be order-preserving if a <, b (where a, b e M) implies/(«) < 
f(b) (in M'). An order-preserving mapping/such that f(a) < f(b) implies 
a < b is called an isomorphism. In other words, an isomorphism between 
two partially ordered sets M and M’ is a one-to-one mapping of M onto M' 
such that f(a) < f(b) if and only if a < b. Two partially ordered sets M 
and M' are said to be isomorphic (to each other) if there exists an isomorphism 
between them. 

Example. Let M be the set of positive integers greater than 1 partially 
ordered as in Example 4, Sec. 3.1, and let M' be the same set partially ordered 
in the natural way, i.e., in such a way that a < b if and only if b — a is 
nonnegative. Then the mapping of M onto M' carrying every integer n 
into itself is order-preserving, but not an isomorphism. 

Isomorphism between partially ordered sets is an equivalence relation 
as defined in Sec. 1.4, being obviously reflexive, symmetric and transitive. 
Hence any given family of partially ordered sets can be partitioned into 
disjoint classes of isomorphic sets . 7 Clearly, two isomorphic partially 
ordered sets can be regarded as identical in cases where it is the structure 
of the partial ordering rather than the specific nature of the elements of the 
sets that is of interest. 

3.3. Ordered sets. Order types. Given two elements a and b of a partially 
ordered set M, it may turn out that neither of the relations a < b or b < a 
holds. In this case, a and b are said to be noncomparable. Thus, in general, 
the relation < is defined only for certain pairs of elements, which is why M 
is said to be partially ordered. However, suppose M has no noncomparable 
elements. Then M is said to be ordered (synonymously, simply or linearly 
ordered). In other words, a set M is ordered if it is partially ordered and if, 
given any two distinct elements a, b e M, either a < b or b < a. Obviously, 
any subset of an ordered set is itself ordered. 

Each of the sets figuring in Examples 1-4, Sec. 3.1 is partially ordered, 
but not ordered. Simple examples of ordered sets are the set of all positive 
integers, the set of all rational numbers, the set of all real numbers in the 


7 Note that we avoid talking about the “family of all partially ordered sets” (recall 
Problem 10, p. 20). 
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interval [ 0 , 1 ], and so on (with the usual relations of “greater than” and “less 
than”). 

Since an ordered set is a special kind of partially ordered set, the concepts 
of order-preserving mapping and isomorphism apply equally well to ordered 
sets. Two isomorphic ordered sets are said to have the same (order) type. 
Thus “type” is something shared by all isomorphic ordered sets, just as 
“power” is something shared by all equivalent sets (considered as “plain” 
sets, without regard for possible orderings). 

The simplest example of an ordered set is the set of all positive integers 
1,2,3, .. . arranged in increasing order, with the usual meaning of the 
symbol <. The order type of this set is denoted by the symbol w. Two iso¬ 
morphic ordered sets obviously have the same power (an isomorphism is a 
one-to-one correspondence). Thus it makes sense to talk about the power 
corresponding to a given order type. For example, the power K„ corresponds 
to the order type «. The converse is not true, since a set of a given power can 
in general be ordered in many different ways. It is only in the finite case that 
the number of elements in a set uniquely determines its type, designated by 
the same symbol n as the number of elements in the set. For example, 
besides the “natural” order type « of the set of positive integers, there is 
another order type corresponding to the sequence 

1,3,5, ...,2,4,6, ..., 

where odd and even numbers are separately arranged in increasing order, 
but any odd number precedes any even number. It can be shown that the 
number of distinct order types of a set of power K 0 is infinite and in fact 
uncountable. 

3.4. Ordered sums and products of ordered sets. Let M x and M 2 be two 
ordered sets of types 0! and 0 a , respectively. Then we can introduce an 
ordering in the union M x U M 2 of the two sets by assuming that 

1) a and b have the same ordering as in M x if a, b e M x ; 

2) a and b have the same ordering as in M 2 if a, b e M 2 ; 

3) a < b if a e M x , b e M 2 

(verify that this is actually an ordering of M x U M 2 ). The set M x U M 2 
ordered in this way is called the ordered sum of M x and M 2 , denoted by 
M x + tW 2 . Note that the order of terms matters here, i.e., in general M 2 + M x 
is not isomorphic to M x + M 2 . More generally, we can define the ordered 
sum of any finite number of ordered sets by writing (cf. Problem 6 ) 

M 1 -)- M a + M 3 = (M x + M 2 ) + M 3 , 

Mi + M 2 + M 3 + M 4 = (M x -f- M 2 + M 3 ) + M x , 


and so on. By the ordered sum of the types 0 X and 0 2 , denoted by 0 X + 0 2 , 
we mean the order type of the set M x + M 2 . 

Example. Consider the order types w and n. It is easy to see that 
« + <o = to. In fact, if finitely many terms are written to the left of the 
sequence 1,2we again get a set of the same type (why?). 
On the other hand, the order type cn + n, i.e., the order type of the set 8 

{1,2,... , k ,... , ct x , #2* • • • > ti ^}, 

is obviously not equal to w. 

Again let Mj and M 2 be two ordered sets of types 0j and 0 2 , respectively. 
Suppose we replace each element of M 2 by a “replica” of the set M x . Then 
the resulting set, denoted by M x • M 2 , is called the ordered product of M x 
and M 2 . More exactly, M x ■ M 2 is the set of all pairs (a, b) where a e M x , 
b e M a , ordered in such a way that 

1) (a x , b x ) < (a 2 , b 2 ) if b x < b 2 (for arbitrary a x , a 2 ); 

2) .(«!, b) < (a 2 , b) if a x < a 2 . 

Note that the order of factors matters here, i.e., in general M 2 • M x is not 
isomorphic to M x • M 2 . The ordered product of any finite number of ordered 
sets can be defined by writing (cf. Problem 6) 

M x -M 2 -M 3 = (M x ■ M 2 ) • m 3 , 

M x • M 2 • M 3 • M x = (M x ■ M 3 ■ M 3 ) ■ M 4 , 

and so on. By the ordered product of the types 0! and 0 2 , denoted by 0 X • 0 2 , 
we mean the order type of the set M x • M 2 . 

3.5. Well-ordered sets. Ordinal numbers. A key concept in the theory of 
ordered sets is given by 

Definition 1. An ordered set M is said to be well-ordered if every 
nonempty subset A of M has a smallest (or “first”) element, i.e., an element 
(x such that p. < a for every a 6 A. 

Example 1. Every finite ordered set is obviously well-ordered. 

Example 2. Every nonempty subset of a well-ordered set is itself well- 
ordered. 

Example 3. The set M or rational numbers in the interval [0, 1] is ordered 
but not well-ordered. It is true that M has a smallest element, namely the 


8 Here we use the same curly bracket notation as in Sec. 1.1, but the order of terms 
is now crucial. 
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number 0, but the subset of M consisting of all positive rational numbers 
has no smallest element. 

Definition 2. The order type of a well-ordered set is called an ordinal 
number or simply an ordinal . 9 If the set is infinite, the ordinal is said to be 
transfinite. 

Example 4. The set of positive integers 1, 2,..., k,. . . arranged in 
increasing order is well-ordered, and hence its order type co is a (transfinite) 
ordinal. The order type co + n of the set 

{1)2,..., Jc ,... , aj, # 2 , • • • , a, n } 

is also an ordinal. 

Example 5. The set 

3,- 2 ,- 1 } ( 1 ) 

is ordered but not well-ordered. It is true that any nonempty subset A of 
(1) has a largest element (i.e., an element v such that a < v for every a e A), 
but in general A will not have a smallest element. In fact, the set (1) itself 
has no smallest element. Hence the order type of (1), denoted by co*, is not 
an ordinal number. 

Theorem 1. The ordered sum of a finite number of well-ordered sets 
Mi, M 2 , . . . , M n is itself a well-ordered set. 

Proof. Let M be an arbitrary subset of the ordered sum M 1 + M 2 + 

• ■ • + M n , and let M k be the first of the sets M lt M t , . . . , M n (namely 
the set with smallest index) containing elements of M. Then M n M k 
is a subset of the well-ordered set M k , and as such has a smallest element 
[x. Clearly p. is the smallest element of M itself, j 

Corollary. The ordered sum of a finite number of ordinal numbers is 
itself an ordinal number. 

Thus new ordinal numbers can be constructed from any given set of 
ordinal numbers. For example, starting from the positive integers (i.e., the 
finite ordinal numbers) and the ordinal number co, we can construct the new 
ordinal numbers 

co + n, co-f-co, co -f- co + n, co + co-|-co, 

and so on. 

Theorem 2. The ordered product of two well-ordered sets M 1 and M 2 
is itself a well-ordered set. 


9 This is a good place to point out that the terms “cardinal number” and “power” 
(of a set) are synonymous. 


Proof. Let M be an arbitrary subset of M x ■ M 2 , so that M is a set of 
ordered pairs (a, b) with a e M u b e M 2 . The set of all second elements b 
of pairs in M is a subset of M a , and as such has a smallest element since 
M 2 is well-ordered. Let b k denote this smallest element, and consider 
all pairs of the form {a, bj contained in M. The set of all first elements 
a of these pairs is a subset of M u and as such has a smallest element 
since Mj is well-ordered. Let a x denote this smallest element. Then the 
pair (a 1; b x ) is clearly the smallest element of M. 1 

Corollary 1. The ordered product of a finite number of well-ordered 
sets is itself a well-ordered set. 

Corollary 2. The ordered product of a finite number of ordinal num¬ 
bers is itself an ordinal number. 

Thus it makes sense to talk about the ordinal numbers 
co • n, co 2 , co 2 • n, co 3 , 

and so on. It is also possible to define such ordinal numbers as 10 

CO , (0 , . . . 

3.6. Comparison of ordinal numbers. If n x and « 2 are two finite ordinal 
numbers, then they either coincide or else one is larger than the other. As 
we now show, the same is true of transfinite ordinal numbers. We begin by 
observing that every element a of a well-ordered set M determines an ( initial ) 
section P, the set of all x e M such that x < a, and a remainder Q, the set 
of all a e M such that x > a. Given any two ordinal numbers a and (S, let 
M and N be well-ordered sets of order type a and p, respectively. Then we 
say that 

1) a = p if M and N are isomorphic ; 

2) a < p if M is isomorphic to some section of N; 

3) a > p if N is isomorphic to some section of M 

(note that this definition makes sense for finite a and p). 

Lemma. Let f be an isomorphism of a well-ordered set A onto some 
subset B <=■ A. Then f (a) > a for all a e A. 

Proof. If there are elements ae A such that/(a) < a, then there is a 
least such element since A is well-ordered. Let a 0 be this element, and 
let b 0 = f(a 0 ). Then b 0 < a 0 , and hence f(b 0 ) </(a 0 ) = b 0 since f is an 
isomorphism. But then a 0 is not the smallest element such that/(a) < a. 
Contradiction! | 


10 See e.g., A. A. Fraenkel, Abstract Set Theory, third edition, North-Holland Pub¬ 
lishing Co., Amsterdam (1966), pp. 202-208. 
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Tt follows from the lemma that a well-ordered set A cannot be iso¬ 
morphic to any of its sections, since if A were isomorphic to the section 
determined by a, then clearly/(a) < a. In other words, the two relations 


a = P, 

a < p 

are incompatible, and so are 


a = p, 

a > p. 

Moreover, the two relations 


a < P, 

a > p 


are incompatible, since otherwise we could use the transitivity to deduce 
a < a, which is impossible by the lemma. Therefore, if one of the three 
relations 

a < (3, a = (3, a > [3 (2) 

holds, the other two are automatically excluded. We must still show that 
one of the relations (2) always holds, thereby proving that any two ordinal 
numbers are comparable. 

Theorem 3. Two given ordinal numbers a and (3 satisfy one and only 
one of the relations 

a < (3, a = (3, a > (3. 

Proof. Let IT(a) be the set of all ordinals <a. Any two numbers 
y and y' in W( a) are comparable 11 and the corresponding ordering of 
fV(tx) makes it a well-ordered set of type a. In fact, if a set 

A = {..., a . b ,...} 

is of type a, then by definition, the ordinals less than a are the types of 
well-ordered sets isomorphic to sections of A. Hence the ordinals them¬ 
selves are in one-to-one correspondence with the elements of A. In other 
words, the elements of a set of type a can be numbered by using the 
ordinals less than a: 

A {hi, • • • j tt n> . . .}. 

Now let a and (3 be any two ordinals. Then W(a) and are well- 
ordered sets of types a and (3, respectively. Moreover, let C = A O B 
be the intersection of the sets A and B, i.e., the set of all ordinals less than 
both a and (3. Then C is well-ordered, of type y, say. We now show that 
y < a. If C = A, then obviously y ;= a. On the other hand, if 
then C is a section of A and hence y < a. In fact, let E. e C, t] e A — C. 
Then \ and yj are comparable, i.e., either £, < yj or \ > yj. But yj < 5 < « 

11 Recall the meaning of y < a, y' < a, and use the fact that a section of a section of 
a well-ordered set is itself a section of a well-ordered set. 
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is impossible, since then y jeC. Therefore S < yj and hence C is a 
section of A, which implies y < a. Moreover, y is the first element of 
the set A — C. Thus y < a, as asserted, and similarly y < (3. The case 
y < a, y < (3 is impossible, since then y e A — C, y e B — C. But 
then y £ C on the one hand and yeAnB = C on the other hand. 

It follows that there are only three possibilities 

y = a, y = (3, a = p, 

y = a, y < P, a < p, 

y < a, y = P, a > p, 

i.e., a and p are comparable. 1 

Theorem 4. Let A and B be well-ordered sets. Then either A is equivalent 
to B or one of the sets is of greater power than the other, i.e ., the powers 
of A and B are comparable. 

Proof. There is a definite power corresponding to each ordinal. But 
we have just seen that ordinals are comparable, and so are the corre¬ 
sponding powers (recall the definition of inequality of powers given in 
Sec. 2.5). 1 

3.7. The well-ordering theorem, the axiom of choice and equivalent asser¬ 
tions. Theorem 4 shows that the powers of two well-ordered sets are always 
comparable. In 1904, Zermelo succeeded in proving the 

Well-ordering theorem. Every set can be well-ordered. 

It follows from the well-ordering theorem and Theorem 5 that the powers of 
two arbitrary sets are always comparable, a fact already used in Sec. 2.5. 
Zermelo’s proof, which will not be given here, 12 rests on the following basic 

Axiom of choice. Given any set M, there is a “choice function” f such 
that f(A) is an element of A for every nonempty subset A <= M. 

We will assume the validity of the axiom of choice without further ado. 
In fact, without the axiom of choice we would be severely hampered in 
making set-theoretic constructions. However, it should be noted that from 
the standpoint of the foundations of set theory, there are still deep and 
controversial problems associated with the use of the axiom of choice. 

There are a number of assertions equivalent to the axiom of choice, i.e., 
assertions each of which both implies and is implied by the axiom of choice. 
One of these is the well-ordering theorem, which obviously implies the axiom 
of choice. In fact, if an arbitrary set Mean be well-ordered, then, by merely 
choosing the “first” element in each subset A c- M, we get the function f(A) 

12 A. A. Fraenkel, op. cit., pp. 222-227. 
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figuring in the statement of the axiom of choice. On the other hand, the 
axiom of choice implies the well-ordering theorem, as already noted without 
proof. 

To state further assertions equivalent to the axiom of choice, we need 
some more terminology: 

Definition 3. Let M be a partially ordered set, and let A be any subset 
of M such that a and b are comparable for every a, b e A. Then A is called 
a chain {in M). A chain C is said to be maximal if there is no other chain C' 
in M containing C as a proper subset. 

Definition 4. An element a of a partially ordered set M is called an 
upper bound of a subset M' <= M if a' < a for every a' e M'. 

We now have the vocabulary needed to state two other assertions equiv¬ 
alent to the axiom of choice: 

Hausdorff’s maximal principle. Every chain in a partially ordered 
set M is contained in a maximal chain in M. 

Zorn’s lemma. If every chain in a partially ordered set M has an upper 
bound, then M contains a maximal element. 

For the proof of the equivalence of the axiom of choice, the well-ordering 
theorem, Hausdorff’s maximal principle and Zorn’s lemma, we refer the 
reader elsewhere . 13 Of these various equivalent assertions, Zorn’s lemma is 
perhaps the most useful. 

3.8. Transfinite induction. Mathematical propositions are very often 
proved by using the following familiar 

Theorem 4 (Mathematical induction). Given a proposition P(n) formu¬ 
lated for every positive integer n, suppose that 

1 ) T(l) is true; 

2) The validity of P{k ) for all k < n implies the validity of P(n + 1). 
Then P(n ) is true for all n — 1,2,... 

Proof. Suppose P(n) fails to be true for all n = 1, 2, ... , and let 

be the smallest integer for which P(n) is false (the existence of n x 
follows from the well-ordering of the positive integers). Clearly n± > 1, 
so that «i — 1 is a positive integer. Therefore P{n ) is valid for all 
k < «i — 1 but not for n x . Contradiction! | 

Replacing the set of all positive integers by an arbitrary well-ordered set, 

13 See e.g., G. Birkhoff, Lattice Theory, third edition, American Mathematical Society, 
Providence, R.I. (1967), pp. 205-206. 
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we get 

Theorem 4'. {Transfinite induction ). Given a well ordered set A, 11 let 
P(a) be a proposition formulated for every element a e A. Suppose that 

1) P(a ) is true for the smallest element of A; 

2) The validity of P{a) for all a < a* implies the validity of P{a*). 
Then P(a ) is true for all a e A. 

Proof. Suppose P(a) fails to be true tor all a e A. Then P{a) is false 
for all a in some nonempty subset A* cz A. By the well-ordering, A* 
has a smallest element a*. Therefore P{a) is valid for all a < a* but 
not for a*. Contradiction! | 

Remark. Since any set can be well-ordered, by the well-ordering theorem, 
transfinite induction can in principle be applied to any set M whatsoever. 
In practice, however, Zorn’s lemma is a more useful tool, requiring only that 
M be partially ordered. 

3.9. Historical remarks. Set theory as a branch of mathematics in its 
own right stems from the pioneer work of Georg Cantor (1845-1918). 
Originally met with disbelief, Cantor’s ideas subsequently became widespread. 
By now, the set-theoretic point of view has become standard in the most 
diverse fields of mathematics. Basic concepts, like groups, rings, fields, linear 
spaces, etc. are habitually defined as sets of elements of an arbitrary kind 
obeying appropriate axioms. 

Further development of set theory led to a number of logical difficulties, 
which naturally gave rise to attempts to replace “naive” set theory by a more 
rigorous, axiomatic set theory. It turns out that certain set-theoretic questions, 
which would at first seem to have “yes” or “no” answers, are in fact of a 
different kind. Thus it was shown by Godel in 1940 that a negative answer 
to the question “Is there an uncountable set of power less than that of the 
continuum” is consistent with set theory (axiomatized in a way we will not 
discuss here), but it was recently shown by Cohen that an affirmative answer 
to the question is also consistent in the same sense! 

Problem 1. Exhibit both a partial ordering and a simple ordering of the 
set of all complex numbers. 

Problem 2. What is the minimal element of the set of all subsets of a 
given set X, partially ordered by set inclusion. What is the maximal element? 

Problem 3. A partially ordered set M is said to be a directed set if, given 
any two elements a,b e M, there is an element c. e M such that a < c,b < c. 
Are the partially ordered sets in Examples 1-4, Sec. 3.1 all directed sets? 


14 For example, the set of all transfinite ordinals less than a given ordinal. 
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Problem 4. By the greatest lower bound of two elements a and b of a 
partially ordered set M , we mean an element c e M such that c < a, c < b 
and there is no element d e M such that c < d < a, d < b. Similarly, by 
the least upper bound of a and b, we mean an element c e M such that a < c, 
b < c and there is no element d e M such that a < d < c, b < d. By a 
lattice is meant a partially ordered set any two element of which have both 
a greatest lower bound and a least upper bound. Prove that the set of all 
subsets of a given set X, partially ordered by set inclusion, is a lattice. What 
is the set-theoretic meaning of the greatest lower bound and least upper 
bound of two elements of this set? 

Problem 5. Prove that an order-preserving mapping of one ordered set 
onto another is automatically an isomorphism. 

Problem 6. Prove that ordered sums and products of ordered sets are 
associative, i.e., prove that if M 1 , M,, and M 3 are ordered sets, then 

(Mj + M 2 ) + M s = M 1 + (M 2 + M 3 ), (M, • M 2 ) • M 3 = M t • (M 2 • M 3 ), 

where the operations + and • are the same as in Sec. 3.4. 

Comment. This allows us to drop parentheses in writing ordered sums 
and products. 

Problem 7. Construct well-ordered sets with ordinals 

to + n, co + co, co + co + «, <o + m + to,.. . 

Show that the sets are all countable. 

Problem 8. Construct well-ordered sets with ordinals 
to ■ n, co 2 , co 2 • n, co 3 , . . . 

Show that the sets are all countable. 

Problem 9. Show that 

co -p co — co * 2, co-{-to-{-to := to * 3,... 

Problem 10. Prove that the set W(<x) of all ordinals less than a given 
ordinal a is well-ordered. 

Problem 11. Prove that any nonempty set of ordinals is well-ordered. 

Problem 12. Prove that the set M of all ordinals corresponding to a 
countable set is itself uncountable. 

Problem 13. Let be the power of the set M in the preceding problem. 
Prove that there is no power m such that K 0 < m < 
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4. Systems of Sets 15 

4.1. Kings of sets. By a system of sets we mean any set whose elements 
are themselves sets. Unless the contrary is explicitly stated, the elements 
of a given system of sets will be assumed to be certain subsets of some fixed 
set X. Systems of sets will usually be denoted by capital script letters like 
3%, if, etc. Our chief interest will be systems of sets which have certain 
closure properties under the operations introduced in Sec. 1.1. 

Definition 1 . A nonempty system of sets 3& is called a ring (of sets) if 
A A B e 3$ and A <~\ B e 3% whenever A e 3 %, B e 
Since 

A U B = (A A B) A (A n B), 

A — B = A A (A n B), 

we also have AUBe3? and A-Be3? whenever A e 01, Be 3%. 
Thus a ring of sets is a system of sets closed under the operations of 
taking unions, intersections, differences, and symmetric differences. 
Clearly, a ring of sets is also closed under the operations of taking finite 
unions and intersections: 

n n 

U A k , f)A k . 

k =1 k= 1 

A ring of sets must contain the empty set 0, since A — A — 0. 

A set E is called the unit of a system of sets SB if E e SB and 

A D E = A 

for every A e SB. Clearly E is unique (why ?). Thus the unit of SB is 
just the maximal set of SB , i.e., the set containing all other sets of SB. 

A ring of sets with a unit is called an algebra (of sets). 

Example 1. Given a set A, the system Jt(A) of all subsets of A is an 
algebra of sets, with unit E = A. 

Example 2. The system {0, A} consisting of the empty set 0 and any 
nonempty set A is an algebra of sets, with E = A. 

Example 3. The system of all finite subsets of a given set A is a ring of 
sets. This ring is an algebra if and only if A itself is finite. 

Example 4. The system of all bounded subsets of the real line is a ring of 
sets, which does not contain a unit. 


15 The material in this section need not be read now, since it will not be needed until 
Chapter 7. 
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Theorem 1. The intersection 

.« & = n 

■<?i: • a 

of any set of rings is itself a ring. 

Proof. An immediate consequence of Definition 1. | 

Theorem 2. Given any nonempty system of sets SP, there is a unique 
ring & containing SP and contained in every ring containing SP. 

Proof. If SP exists, then clearly SP is unique (why?). To prove the 
existence of SP, consider the union 

X = U A 

A esc 

of all sets A belonging to SP and the ring JP(X) of all subsets of X. Let 
E be the set of all rings of sets contained in J({X) and containing SP. 
Then the intersection 

3» = n an 

of all these rings clearly has the desired properties. In fact, SP obviously 
contains SP. Moreover, if Si* is any ring containing SP , then the 
intersection S% = Si* C\.Ji{X) is a ring in E and hence SP <=■ SP c Si*, 
as required. The ring SP is called the minimal ring generated by the system 
SP, and will henceforth be denoted by Si{SP). | 

Remark. The set. JP(X ) containing Si{SP) has been introduced to avoid 
talking about the “set of all rings containing SPX Such concepts as “the 
set of all sets,” “the set of all rings,” etc. are inherently contradictory and 
should be avoided (recall Problem 10, p. 20). 

4.2. Semirings of sets. The following notion is more general than that 
of a ring of sets and plays an important role in a number of problems (par¬ 
ticularly in measure theory): 

Definition 2. A system of sets SP is called a semiring {of sets) if 

1 ) SP contains the empty set 0 ; 

2) A n B e SP whenever A e SP, B e SP; 

3) If SP contains the sets A and A x ci A, then A can be represented 
as a finite union 

n 

a = U A k (1) 

k-1 

of pairwise disjoint sets of SP , with the given set A x as its first term. 
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Remark. The representation (1) is called a finite expansion of A, with 
respect to the sets A x , A 2 , . . . , A n . 

Example 1. Every ring of sets ffl is a semiring, since if 3% contains A and 
A x <= A, then A = A x U A 2 where A 2 = A — A x e Si. 

Example 2. The set SP of all open intervals {a, b), closed intervals [a, b] 
and half-open intervals [a, b), {a, b ], including the “empty interval” (a, a) = 
0 and the single-element sets [a, a] = {a}, is a semiring but not a ring. 

Lemma 1. Suppose the sets A, A x , ..., A n , where A x , . . . , A„ are 
pairwise disjoint subsets of A , all belong to a semiring SP. Then there is a 
finite expansion 

S 

A = U A k (s > n ) 

Jc*=l 

with A x , . . . , A n as its first n terms, where A k e SP , A k O A t = 0 for all 
k, l = 1 

Proof. The lemma holds for n = 1, by the definition of a semiring. 

Suppose the lemma holds for n = m, and consider m + 1 sets A x . 

A m , A mi _ x satisfying the conditions of the lemma. By hypothesis, 

A = A l U---uA m UB 1 U---uB„ 

where the sets A x , . . . , A m , B x , . . . , B p are pairwise disjoint subsets of 
A, all belonging to SP. Let 

•®«i = A m+1 n B q . 

By the definition of a semiring, 

B v = B q1 U • ■ ■ U B qv 

where the sets B qj (j = 1, . . . , r Q ) are pairwise disjoint subsets of B„, 
all belonging to SP. But then it is easy to see that 

A = A x u • • • U A m U A m+1 uUfU bA 

5=1 \i =2 / 

i.e., the lemma is true for n = m -r 1. The proof now follows by mathe¬ 
matical induction. | 

Lemma 2. Given any finite system of sets A x , . . . , A n belonging to a 
semiring SP, there is a finite system of pairwise disjoint sets B u ... , B t 
belonging to SP such that every A k has a finite expansion 

A k = U B s (k = 1, ... ,ri) 

seM k 

with respect to certain of the sets B s . u 

16 Here M k denotes some subset of the set (1, 2, .... f}, depending on the choice of k. 





34 SET THEORY 


CHAP. 1 


SEC. 4 


SYSTEMS OF SETS 35 


Proof. The lemma is trivial for n = 1, since we need only set t = 1, 

B 1 = A x Suppose the lemma is true for n — m, and consider a system 

of sets A x . A m , A m . , in SA. Let B u ... , B t be sets of SA satisfying 

the conditions of the lemma with respect to A x ,. . . , A m , and let 

B A A m ■ 1 B s . 

Then, by Lemma 1, there is an expansion 

A m+1 = (UB S1 ) u Jub;) (b; eso, 

while, by the very definition of a semiring, there is an expansion 
B s = B a U B s2 u • • • u B sri (B si e SO- 
It is easy to see that 

A k =\J (0B aj ) (fc = 1 ,. . •, w) 

seikfi \i=l / 

for some suitable M*. Moreover, the sets B sj , B' v are pairwise disjoint. 
Hence the sets B sj , B' v satisfy the conditions of the lemma with respect 
to A u . .. , A m , A m+1 . The proof now follows by mathematical induc¬ 
tion. 1 

4.3. The ring generated by a semiring. According to Theorem 1, there is 
a unique minimal ring £%{SA) generated by a given system of sets SA. The 
actual construction of SA(SS) is quite complicated for arbitrary SA. However, 
the construction is completely straightforward if SA is a semiring, as shown 
by 

Theorem 3. If SA is a semiring, then M(SA) coincides with the system 
2A of all sets A which have finite expansions 

fc= 1 

with respect to the sets A k e SA. 

Proof. First we prove that 2S is a ring. Let A and B be any two sets in 
2A. Then there are expansions 

m 

A = U A { {A t e SO, 

i =1 

B = U Bj (Bj e SA). 

3=1 

Since is a semiring, the sets 

C iS = A t C\ B, 


also belong to SA. By Lemma 1, there are expansions 

Ai= {Od^ 

(G C„ j U Ejij 

It follows from (2) that A n B and A A B have the expansions 
A nB = \J C„, 

A A B - (U u E„), 

and hence belong to 3S. Therefore 2£ is a ring. The fact that ^ is the 
minimal ring generated by SA is obvious, g 

4.4. Borel algebras. There are many problems (particularly in measure 
theory) involving unions and intersections not only of a finite number of 
sets, but also of a countable number of sets. This motivates the following 
concepts: 

Definition 3. A ring of sets is called a a-ring if it contains the union 
S = Ud„ 

n =1 

whenever it contains the sets A x , A 2 , . . . , A „,.... A a-ring with a unit 
E is called a n-algebra. 

Definition 4. A ring of sets is called a S-ring if it contains the inter¬ 
section 

CO 

d = r\A n 

11=1 

whenever it contains the sets A lt A 2 ,. . . , A n , . . . . A S-ring with a unit 
E is called a 8-algebra. 

Theorem 4. Every a-algebra is a 8-algebra and conversely. 

Proof. An immediate consequence of the “dual” formulas 

U4 = £-D(£- aj, 

n n 

f]A n = E-U{E~ AJ. 1 

n n 

The term Borel algebra (or briefly, B-algebra ) is often used to denote 
a c-algebra (equivalently, a S-algebra). The simplest example of a B-algebra 
is the set of all subsets of a given set A. 


(D ik eSA), 


(E n e SA). 


( 2 ) 
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Given any system of sets 68, there always exists at least one 5-algebra 

containing 68. In fact, let , . 

6 X = {J A. 

Aey 

Then the system 88 of all subsets of X is clearly a 5-algebra containing 68. 

If 88 is any 5-algebra containing 68 and if E is its unit, then every 
A g 68 is contained in E and hence 

X = U A <= E. 

A 5-algebra 88 is called irreducible (with respect to the system 68) if X = E, 
i.e., an irreducible 5-algebra is a 5-algebra containing no points that do 
not belong to one of the sets A e 68. In every case, it will be enough to 
consider only irreducible 5-algebras. 

Theorem 2 has the following analogue for irreducible 5-algebras: 

Theorem 5. Given any nonempty system of sets 68, there is a unique 
irreducible 17 B-algebra 88(68) containing 68 and contained in every 
B-algebra containing 68. 

Proof. The proof is virtually identical with that of Theorem 2. The 
5-algebra 88(68) is called the minimal B-algebra generated by the system 
68 or the Borel closure of 68. g 

Remark. An important role is played in analysis by Borel sets or B-sets. 
These are the subsets of the real line belonging to the minimal 5-algebra 
generated by the set of all closed intervals [a, b]. 

Problem 1. Let X be an uncountable set, and let ^ be the ring consisting 
of all finite subsets of X and their complements. Is 88 a a-ring? 

Problem 2. Are open intervals Borel sets ? 

Problem 3. Let y = f(x) be a function defined on a set M and taking 
values in a set N. Let .J8 be a system of subsets of M, and let f(J8) denote 
the system of all images f(A) of sets A eJ8. Moreover, let ^8 be a system 
of subsets of N, and let f~\AT) denote the system of all preimages f X (B) 
of sets 5 e J8. Prove that 

a) If J8 is a ring, so is f~\J8)\ 

b) If J8 is an algebra, so is f~\J8)\ 

c) If J8 is a 5-algebra, so is f~\J8 ); 

d) 88(f-\jr)) =f-\mA~)y, 

e) 88(j~\jri) =f-\avr)\ 

Which of these assertions remain true if J8 is replaced by^f and / _1 by/? 
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5. Basic Concepts 

5.1. Definitions and examples. One of the most important operations in 
mathematical analysis is the taking of limits. Here what matters is not so 
much the algebraic nature of the real numbers , 1 but rather the fact that 
distance from one point to another on the real line (or in two or three- 
dimensional space) is well-defined and has certain properties. Roughly 
speaking, a metric space is a set equipped with a distance (or “metric”) 
which has these same properties. More exactly, we have 

Definition 1. By a metric space is meant a pair (X, p) consisting of 
a set X and a distance p, i.e., a single-valued, nonnegative, real function 
p(x, y) defined for all x, y e X which has the following three properties: 

1 ) p(x, y) = 0 if and only if x = y; 

2) Symmetry: p (x,y) = p (y,x); 

3) Triangle inequality: p(x, z ) < p (x,y) + p (y, z). 

We will often refer to the set X as a “space” and its elements x,y,. . . as 
“points.” Metric spaces are usually denoted by a single letter, like 

R = (X, p), 

or even by the same letter X as used for the underlying space, in cases where 
there is no possibility of confusion. 


1 I.e., the fact that the real numbers form a field. 

37 


17 More exactly, irreducible with respect to y. 
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Example 1. Setting 

("0 if x = y, 

P = L ... , 

(1 it x^=y, 

where a; and y are elements of an arbitrary set X, we obviously get a metric 
space, which might be called a “discrete space” or a “space of isolated 
points.” 

Example 2. The set of all real numbers with distance 
p (x,y) = \x — y\ 

is a metric space, which we denote by R 1 . 

Example 3. The set of all ordered ^-tuples 

X = Ol, *2, • • ■ , X n ) 

of real numbers x lt x 2 . x n , with distance 


P(x,y) = Jl(x k -y k ) 2 , (1) 

V 4=1 

is a metric space denoted by R n and called n-dimensional Euclidean space 
(or simply Euclidean n-space). The distance (1) obviously has properties 
1) and 2) in Definition 1. Moreover, it is easy to see that (1) satisfies the 
triangle inequality. In fact, let 

X — (Xj, X 2 , . * • , X n ), y — (Tl’T2’ * " ' ’ yrd ’ “ (^i, “2 . * • • > 2 n ) 

be three points in R n , and let 

a, c = x k -y k , b k — y k — z k (k = l,...,n). 

Then the triangle inequality takes the form 


2(x k - z k ) 2 < / %(x k - y k f + / 2 O'* - z, ( ) 2 , 


or equivalently 


1(a k + b k f < J2 a l + Jib. 


4= 1 V 4=1 


It follows from the Cauchy-Schwarz inequality 


(see Problem 2) that 


( n \2 n n 

1a k b k )<1al1bf c 

4=1 ! 4-1 k=l 


1(a k + b k f = 1a 2 k + 2la k b k + 1b\ 

1 fc=1 1 7c=l 


^ In n n / n / w \ 2 

<I 4 + 2 /2 4 Xbl + lb* = (J lat + / Ibl) 

4=1 V 4-1 4=1 4=1 \V 4-1 V 4=1 / 


Taking square roots, we get (2') and hence (2). 
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Example 4. Take the same set of ordered n-tuples x = (x 1; . . . , x B ) as in 
the preceding example, but this time define the distance by the function 

Pi(x,y) =1\x k - y k \. (4) 

4=1 

It is clear that (4) has all three properties of a distance figuring in Definition 
1. The corresponding metric space will be denoted by R 

Example S. Take the same set as in Examples 3 and 4, but this time 
define distance between two points x = (Xj,.. . , x„) and y = {y u . . . , y j 
by the formula 

Po(x,y) = max \x k ~ y k \. (5) 

Then we again get a metric space (verify all three properties of the distance). 
This space, denoted by R%, is often as useful as the Euclidean space R". 

Remark. The last three examples show that it is sometimes important 
to use a different notation for a metric space than for the underlying set of 
points in the space, since the latter can be “metrized” in a variety of different 
ways. 

Example 6. The set C [a tjj of all continuous functions defined on the 
closed interval [a, b ], with distance 

P(/» g ) = max |/(0 - g( 0 l ( 6 ) 

is a metric space of great importance in analysis (again verify the three 
properties of distance). This metric space and the underlying set of “points” 
will both be denoted by the symbol C iaM . Instead of C [01] , we will often 
write just C. A space like C [aM is often called a “function space,” to 
emphasize that its elements are functions. 

Example 7. Let l 2 be the set of all infinite sequences 2 

X ~ (Xj, X 2 , . . . , x k , . . .) 

of real numbers x l5 x 2 ,... , x k ,.. . satisfying the convergence condition 

CO 

Ixl < CO, 

4=1 

2 The infinite sequence with general term x k can be written as { x t } or simply as 
Xi, x 2) . . ., x k ,. . . (this notation is familiar from calculus). It can also be written in 

“point notation” as x = (xj, x 2 . x k ,. . .), i.e., as an “ordered co-tuple” generalizing 

the notion of an ordered n-tuple. (In writing {x k } we have another use of curly brackets, 
but the context will always prevent any confusion between the sequence {x*} and the set 
whose only element is x„.) 
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where distance between points is defined by 

/ 00 

p (0y) = J 2 (*k - ykf- (?) 

V k= 1 

Clearly (7) makes sense for all x, y e l 2 , since it follows from the elementary 
inequality 

(x k ± y k ) 2 < 2(4 + 4) 

that convergence of the two series 

00 00 

24, 14 

7c=l fc=l 

implies that of the series 

00 

2 (** - t*) 2 - 

At the same time, we find that if the points (xj, x 2 ,. . . , x k> ...) and 
( ji,y 2 ,. .. ,y k ,. . .) both belong to 4, then so does the point 

(xi +yi,x 2 + y 2 . x k + y k ,...). 

The function (7) obviously has the first two defining properties of a distance. 
To verify the triangle inequality, which takes the form 

J 2(x k - Z k f < f(x k - y k f + 2 (y k - Z k f (8) 

V *.=i v V fe=i 

for the metric (7), we first note that all three series converge, for the reason 
just given. Moreover, the inequality 

J £(** - z*) 2 <J 2 (** - y«? + Ji(y* - (9) 

V k= 1 V i=i V i=i 

holds for all n, as shown in Example 3. Taking the limit as n—> co in (9), 
we get (8), thereby verifying the triangle inequality in / 2 . Therefore 4 is a 
metric space. 

Example 8. As in Example 6, consider the set of all functions continuous 
on the interval [a, b], but this time define distance by the formula 

P (x, y) = (j^ [x(0 — y(t)f dt^j , (10) 

instead of (6). The resulting metric space will be denoted by C^ ahy The 
first two properties of the metric are obvious, and the fact that (10) satisfies 
the triangle inequality is an immediate consequence of Schwarz's inequality 
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(see Problem 3), by the continuous analogue of the argument given in 
Example 3. 

Example 9. Next consider the set of all bounded infinite sequences of real 
numbers x = (x 1; x 2 , . .. , x k ,.. .), and let 3 

p(x, y) = sup \x k - y k \. (12) 

tc 

This gives a metric space which we denote by m. The fact that (12) has the 
three properties of a metric is almost obvious. 

Example 10. As in Example 3, consider the set of all ordered n- tuples 
x = {x lt , x n ) of real numbers, but this time define the distance by the 
more general formula 


P „(x,y)= (i|. 

\fc=i 


where p is a fixed number > 1 (Examples 3 and 4 correspond to the cases 
p = 2 and p — 1, respectively). This gives a metric space, which we denote 
by 7?”. It is obvious that p p (x, y) — 0 if and only if x = y and that p p (x, y) = 
P®(j> x), but verification of the triangle inequality for the metric (13) requires 
a little work. Let 

x=(*i. x n ), y — (y t ,... ,y n ), z = (z 1; ... , zj 

be three points in R ”, and let 

a k = x k - y k , b k =y k -z k (4=1,..., n), 
just as in Example 3. Then the triangle inequality 

p p (x, z) < p s (x, z) + p P {y, z) 
takes the form of Minkowski's inequality 

/ n \l/v I n \l/j> / n \l/ 2 > 


( n \Vjv / n \l/j) [ n 

2K+ b k \’\ < ( 2KI” + 21 b 

*= i / \*-i / V- 1 


The inequality is obvious for p — 1, and hence we can confine ourselves to 
the case p > 1 . 

The proof of (14) for p > 1 is in turn based on Holder's inequality 


n I n \ib / n \1 Iq 

2>Al < 2kn 2I4I 3 ) = 

7c—1 \fc=l / \k=l / 


where the numbers p > 1 and q > 1 satisfy the condition 

i + i = i. 


3 The least upper bound or supremum of a sequence of real numbers a lt .. a k , . . 

is denoted by sup a k . 

k 
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We begin by observing that the inequality (15) is homogeneous, i.e., if it 
holds for two points (a u , aj and (b u . .. , b n ), then it holds for any 
two points (Xflj,. .. , Xa„) and ([A, . . . , ph K ) where X and p are arbitrary 
real numbers. Therefore we need only prove (15) for the case 

n n 

iKr=mr=i. m 


Thus, assuming that (17) holds, we now 
prove that 

ii«Ai<i. (is) 

*~1 

Consider the two areas S ± and S 2 shown in 
Figure 8, associated with the curve in the £■/]- 
plane defined by the equation 

7 ) = Z*- 1 , 

or equivalently by the equation 

5 = ^T 1 . 



Figure 8 


Then clearly 


e _ a ZV-l JZ _ o_f 6 «-l j _ 

Sl - Jo ^ n ’ — Jo ^ ^ ~' n ' 


Moreover, it is apparent from the figure that 

■Si + iS; > ab 

for arbitrary positive a and b. It follows that 


. a v . b Q 
ab <-1-. 


Setting a = | a k \, b = \b k \, summing over k from 1 to n, and taking account 
of (16) and (17), we get the desired inequality (18). This proves Holder’s 
inequality (15). Note that (15) reduces to Schwarz’s inequality if p = 2. 

It is now an easy matter to prove Minkowski’s inequality (14), starting 
from the identity 

(M + \b\) v = (M + \b\y~ 1 \a\ + (|a| + \b\y~' |6|. 

In fact, setting a — a k , b = b k and summing over k from 1 to n, we obtain 

n n n 

KM + m r = km + m y- 1 \a k \ +Kia fc i +1 b k \y'\b k \. 


Next we apply Holder’s inequality (15) to both sums on the right, bearing 
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in mind that (p — l)q = p: 


n in \l/« / r « "]1/® r « 

KM + MY< (km + my) im* + mr 

k=i a / \ J U =1 J 


Dividing both sides of this inequality by 


we get 


i(Ki +Aiyf 

fc-i / 


/ n \1 !v / n \l/P / n \l/ 3 > 

(KM + MYj < (|A!j +(21^1’) . 

which immediately implies (14), thereby proving the triangle inequality in R 
Example 11. Finally let l p be the set of all infinite sequences 
x == (x^, X 2 , • • . , x k ,. . .) 

of real numbers satisfying the convergence condition 

< 00 

fc=1 

for some fixed number p > 1, where distance between points is defined by 

( CO \1 /p 

K x * - y*\ p ) (2°) 

(the case p — 2 has already been considered in Example 7). It follows from 
Minkowski’s inequality (14) that 

(2K-F,lj < (2l*»lj +(Ky*n (21) 


for any n. Since the series 


21**1’, 2 If* 


converge, by hypothesis, we can take the limit as n -+■ oo in (21), obtaining 


/ 00 \1 lv /CO \l/p /CO \1 

(21^-Fd'J <(21**1”) + (lAf) 


This shows that (20) actually makes sense for arbitrary x,ye At the same 
time, we have verified that the triangle inequality holds in l p (the other two 
properties of a metric are obviously satisfied). Therefore l v is a metric space. 

Remark. If R — (X, p) is a metric space and M is any subset of X, then 
obviously R* = ( M , p) is again a metric space, called a subspace ol the 
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original metric space R. This device gives us infinitely more examples of 
metric spaces. 

5.2. Continuous mappings and homeomorphisms. Isometric spaces. Let/ 
be a mapping of one metric space X into another metric space F, so that 
/ associates an element y =f(x) e Y with each element xeL Then / is 
said to be continuous at the point x 0 eX if, given any s > 0, there exists a 
8 > 0 such that 

p'(/0)>/(*o)) < e 

whenever 

?(x, x 0 ) < 8 

(here p is the metric in X and p' the metric in Y). The mapping / is said 
to be continuous on X if it is continuous at every point x e X. 

Remark. This definition reduces to the usual definition of continuity 
familiar from calculus if X and Y are both numerical sets, i.e., if/is a real 
function defined on some subset of the real line. 

Given two metric spaces X and Y, let /be one-to-one mapping of X onto 
F, and suppose/and f~ l are both continuous. Then /is called a homeo- 
morphic mapping, or simply a homeomorphism (between X and F). Two 
spaces X and F are said to be homeomorphic if there exists a homeomorphism 
between them. 

Example . The function 

2 

y — f(x) — — arc tan x 
n 

establishes a homeomorphism between the whole real line (— oo, oo) and the 
open interval (—1, 1). 

Definition 2. A one-to-one mapping f of one metric space R = (X, p) 
onto another metric space R' = (F, p') is said to be an isometric mapping 
(or simply an isometry) if 

P(*i» *2) = p'(/(*i),/(**)) 

for all x lt x 2 e R. Correspondingly , the spaces R and R' are said to be 
isometric (to each other). 

Thus if R and R' are isometric, the “metric relations” between the 
elements of R are the same as those between the elements of R' , i.e., R and 
R' differ only in the explicit nature of their elements (this distinction is 
unimportant from the standpoint of metric space theory). From now on, 
we will not distinguish between isometric spaces, regarding them simply as 
identical. 
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Remark. We will discuss continuity and homeomorphisms from a more 
general point of view in Sec. 9.6. 

Problem 1. Given a metric space (X, p), prove that 

a) |p(x, z)— p(y, u)\ < p(x, y) + p(z, u) (x,y,z,ueX ); 

b) | pO, z) - 9 (y, z)\ < p(x, y) (x, y,ze X). 


Problem 2. Verify that 


(SW) hi--! i(«,*,- w- 

\k= 1 / fc—1 fc=l 2i=li=.l 

Deduce the Cauchy-Schwarz inequality (3) from this identity. 

Problem 3. Verify that 

(j\(0X0 df) = jV(0 dtjjXt) dt- i £ > / o i, [x(s)y(t) - y(s)x(0] 2 rfs dt. 


Deduce Schwarz’s inequality (11) from this identity. 

Problem 4. What goes wrong in Example 10, p. 41 if p < 1 ? 

Hint. Show that Minkowski’s inequality fails for p < 1. 

Problem 5. Prove that the metric (5) is the limiting case of the metric (13) 
in the sense that 

( n \l/lP 

- Fif) • 

fc=l / 

Problem 6. Starting from the inequality (19), deduce Holder's integral 
inequality 

< (j><»r*r(/>»iT (p + 5 = ')• 

valid for any functions x(t) andy(f) such that the integrals on the right exist. 

Problem 7. Use Holder’s integral inequality to prove Minkowski's integral 
inequality 

(j)x(t) + y(t)\ v dt)j < (j o V(0r) + (|>(0r^) (?>!)• 

Problem 8. Exhibit an isometry between the spaces C [01] and C [12] . 


6. Convergence. Open and Closed Sets 

6.1. Closure of a set. Limit points. By the open sphere (or open ball) 
S(x o, r) in a metric space R we mean the set of points xe R satisfying the 
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inequality 

pOo, x) < r 

(p is the metric of R). 1 The fixed point x 0 is called the center of the sphere, 
and the number r is called its radius. By the closed sphere (or closed ball) 
£[x 0 , r] with center x 0 and radius r we mean the set of points x e R satisfying 
the inequality 

p(*o> *) < r. 

An open sphere of radius e with center x 0 will also be called an s -neighborhood 
of x 0 , denoted by Ofx n ). 

A point x e R is called a contact point of a set M <= R if every neighbor¬ 
hood of x contains at least one point of M. The set of all contact points of a 
set M is denoted by [M] and is called the closure of M. Obviously M <= [M], 
since every point of M is a contact point of M. By the closure operator in 
a metric space R, we mean the mapping of R into R carrying each set M e R 
into its closure [M], 

Theorem 1. The closure operator has the following properties : 

1) If M <= A, then [M] <= [A]; 

2) [[M]] = [M]' 

3) [M u A] = [M] u [A]; 

4) [0]= 0. 

Proof Property 1) is obvious. To prove property 2), let * e [[M]\. 
Then any given neighborhood Ofx) contains a point x 1 e [M\. Consider 
the sphere 0 Ei (x,) of radius 

Si = e - p(x, xf. 

Clearly O^fXj) is contained in Ofx). In fact, if z e O t fx j), then 
p(z, Xj) < s 1 and hence, since p(x, x ± ) = t — e lf it follows from the 
triangle inequality that 

p(z, x) < + (s — Ej) = £, 

i.e., z 6 Ofx). Since x t e [M], there is a point x 2 e M in O t fx). But 
then x 2 e O e (x) and hence x e [M\, since O t (x) is an arbitrary neighbor¬ 
hood of x. Therefore [[37]] <= [37], But obviously [37] <=: [[37]] and 
hence [[37]] = [37], as required. 

To prove property 3), let x e [37 U N] and suppose x £ [M] U [A], 
Then x ^ [M] and x $ [A]. But then there exist neighborhoods O t fx) 
and C E2 (x) such that 0 Sl (x) contains no points of M while O z fx) contains 

1 Any confusion between “sphere” meant in the sense of spherical surface and “sphere” 
meant in the sense of a solid sphere (or ball) will always be avoided by judicious use of the 
adjectives “open” or “closed.” 
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no points of N. It follows that the neighborhood Ofx), where s = 
min {%, e 2 }, contains no points of either M or N, and hence no points 
of M U N, contrary to the assumption that x e [M U N]. Therefore 
x e [M] U [A], and hence 

[M U N] <= [M] u [A], (1) 

since x is an arbitrary point of [M U A]. On the other hand, since 
M cz M U N and N <= M U N, it follows from property 1) that 
[M] <=■ [M U A] and [A] <= [M U A], But then 

[M] U [A] c [M U A], 

which together with (1) implies [M U A] = [M] U [A], 

Finally, to prove property 4), we observe that given any M c R, 

[M] =[MU0]= [M] U [0], 

by property 3). It follows that [0] c [.47]. But this is possible for 
arbitrary M only if [0] = 0. (Alternatively, the set with no elements 
can have no contact points!) 1 

A point x e R is called a limit point of a set M c R if every neighborhood 
of x contains infinitely many points of M. The limit point may or may not 
belong to M. For example, if M is the set of rational numbers in the interval 
[0, 1], then every point of [0, 1], rational or not, is a limit point of M. 

A point x belonging to a set M is called an isolated point of M if there 
is a (“sufficiently small”) neighborhood of x containing no points of M other 
than x itself. 

6.2. Convergence and limits. A sequence of points {x„} = x l5 x 2 , . . ., 
x n ,... in a metric space R is said to converge to a point x e R if every 
neighborhood Ofx) of x contains all points x n starting from a certain index 
(more exactly, if, given any s > 0, there is an integer A E such that Ofx) 
contains all points x n with n > N e ). The point x is called the limit of the 
sequence {x„}, and we write x n —*■ x (as n -> co). Clearly, {x„} converges to 
x if and only if 

lim p(x, x„) = 0. 

n~* co 

It is an immediate consequence of the definition of a limit that 

1) No sequence can have two distinct limits; 

2) If a sequence {x„} converges to a point x, then so does every subse¬ 
quence of {x n } 

(give the details). 
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Theorem 2. A necessary and sufficient condition for a point x to be a 
contact point of a set M is that there exist a sequence {x n } ofpoints of M 
converging to x. 

Proof. The condition is necessary, since if jc is a contact point of M, 
then every neighborhood 0 1/n (x ) contains at least one point x n e M, 
and these points form a sequence {x„} converging to M. The sufficiency 
is obvious. | 

Theorem 2'. A necessary and sufficient condition for a point x to be a 
limit point of a set M is that there exist a sequence {a„} of distinct points 
of M converging to x. 

Proof Clearly, if a: is a limit point of M, then the points x n e 
0 1/n (x) n M figuring in the proof of Theorem 2 can be chosen to be 
distinct. This proves the necessity, and the sufficiency is again obvious, g 

6.3. Dense subsets. Separable spaces. Let A and B be two subsets of a 
metric space R. Then A is said to be dense in B if [A] => B. In particular, 
A is said to be everywhere dense (in R) if [A] — R. A set A is said to be 
nowhere dense if it is dense in no (open) sphere at all. 

Example 1. The set of all rational points is dense in the real line R l . 

Example 2. The set of all points a = (x u a 2 .a„) with rational co¬ 

ordinates is dense in each of the spaces R n , R% and Rf introduced in Examples 
3-5, pp. 38-39. 

Example 3. The set of all points a = (a 1; a 2 . x k ,.. .) with only 

finitely many nonzero coordinates, each a rational number, is dense in the 
space 4 introduced in Example 7, p. 39. 

Example 4. The set of all polynomials with rational coefficients is dense 
in both spaces C [0 6] and C* a 6] introduced in Examples 6 and 8, pp. 39 and 
40. 

Definition. A metric space is said to be separable if it has a countable 
everywhere dense subset. 

Example 5. The spaces R\ R n ,Rg, Rf / 2 , C [o6] , and C* aM are all separable, 
since the sets in Examples 1-4 above are all countable. 

Example 6. The “discrete space” M described in Example 1, p. 38 con¬ 
tains a countable everywhere dense subset and hence is separable if and only 
if it is itself a countable set, since clearly [ M] — M in this case. 

Example 7. There is no countable everywhere dense set in the space m of 
all bounded sequences, introduced in Example 9, p. 41. In fact, consider 
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the set E of all sequences consisting exclusively of zeros and ones. Clearly, 
E has the power of the continuum (recall Theorem 6, Sec. 2.5), since there 
is a one-to-one correspondence between E and the set of all subsets of the 
set Z + = (1,2,... , n, ...} (describe the correspondence). According to 
formula (12), p. 41, the distance between any two points of E equals 1. 
Suppose we surround each point of E by an open sphere of radius |, thereby 
obtaining an uncountably infinite family of pairwise disjoint spheres. Then 
if some set M is everywhere dense in m, there must be at least one point of 
M in each of the spheres. It follows that M cannot be countable and hence 
that m cannot be separable. 

6.4. Closed sets. We say that a subset M of a metric space R is closed if it 
coincides with its own closure, i.e., if [ M] = M. In other words, a set is 
called closed if it contains all its limit points (see Problem 2). 

Example 1. The empty set 0 and the whole space R are closed sets. 

Example 2. Every closed interval [a, b] on the real line is a closed set. 

Example 3. Every closed sphere in a metric space is a closed set. In 
particular, the set of all functions /in the space such that \ f(t)\ < K 
(where K is a constant) is closed. 

Example 4. The set of all functions/in C [a !)] such that |/(t)| < K (an 
open sphere) is not closed. The closure of this set is the closed sphere in the 
preceding example. 

Example 5. Any set consisting of a finite number of points is closed. 

Theorem 3. The intersection of an arbitrary number of closed sets is 
closed. The union of a finite number of closed sets is closed. 

Proof. Given arbitrary sets F a indexed by a parameter a, let a be a 
limit point of the intersection 

F = fl F a - 

a 

Then any neighborhood Ofx) contains infinitely many points of F, and 
hence infinitely many points of each F a . Therefore a is a limit point of 
each F a and hence belongs to each F a , since the sets F a are all closed. 

It follows that a e F, and hence that F itself is closed. 

Next let 

n 

F = U F k 

k =1 

be the union of a finite number of closed sets F k , and suppose a does 
not belong to F. Then a does not belong to any of the sets F k , and hence 
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cannot be a limit point of any of them. But then, for every k, there is a 
neighborhood 0, x [x) containing no more than a finite number of points 
of F k . Choosing : « r 

e = min {e x , . . . , sj, 

we get a neighborhood Ofix) containing no more than a finite number of 
points of F, so that x cannot be a limit point of F. This proves that a 
point x $ F cannot be a limit point of F. Therefore F is closed, g 

6.5. Open sets. A point x is called an interior point of a set M if x has a 
neighborhood Ofix) <= M, i.e., a neighborhood consisting entirely of points 
of M. A set is said to be open if its points are all interior points. 

Example 1. Every open interval (a, b ) on the real line is an open set. In 
fact, if a < x < b, choose s = min {x — a, b — x}. Then clearly Ofx) c 
(«, b). 

Example 2. Every open sphere S(a, r) in a metric space is an open set. 
In fact, x e S(a, r) implies p (a, x) < r. Hence, choosing s — r — p(a, x), we 
have Ofix ) = S(x, e) e S(a, r). 

Example 3. Let M be the set of all functions/in C [ab] such that / (t) < 
g(t), where g is a fixed function in C tal)] . Then M is an open subset of C [a 6] . 

Theorem 4. A subset M of a metric space R is open if and only if its 
complement R — M is closed. 

Proof. If M is open, then every point x e M has a neighborhood 
(entirely) contained in M. Therefore no point x e M can be a contact 
point of R — M. In other words, if x is a contact point of R — M, 
then x 6 R — M, i.e., R — M is closed. 

Conversely, if R — M is closed, then any point x e M must have a 
neighborhood contained in M, since otherwise every neighborhood of x 
would contain points of R — M, i.e., x would be a contact point of 
R — M not in R — M. Therefore M is open, g 

Corollary. The empty set 0 and the whole space R are open sets. 

Proof. An immediate consequence of Theorem 4 and Example 1, 
Sec. 6.4. 1 

Theorem 5. The union of an arbitrary number of open sets is open. The 
intersection of a finite number of open sets is open. 

Proof. This is the “dual” of Theorem 3. The proof is an immediate 
consequence of Theorem 4 and formulas (3)-(4), p. 4. | 


sec. 6 

6.6. Open and closed sets on the real line. The structure of open and closed 
sets in a given metric space can be quite complicated. This is true even for 
open and closed sets in a Euclidean space of two or more dimensions 
{R\ n > 2). In the one-dimensional case, however, it is an easy matter to 
give a complete description of all open sets (and hence of all closed sets): 

Theorem 6. Every open set G on the real line is the union of a finite or 
countable system of pairwise disjoint open intervals , 5 

Proof. Let x be an arbitrary point of G. By the definition of an open 
set, there is at least one open interval containing x and contained in G. 
Let I x be the union of all such open intervals. Then, as we now show, I x 
is itself an open interval. In fact, let 6 

a = inf I x , b = sup I x 

(where we allow the cases a — — oo and b — +oo). Then obviously 

4 <= (a, b). (2) 

Moreover, suppose y is an arbitrary point of (a, b) distinct from x, 
where, to be explicit, we assume that a < y < x. Then there is a point 
y' e 4 such that a < / < y (why?). Hence G contains an open interval 
containing the points y' and x. But then this interval also contains y, 
i.e., y e I x . (The case y > x is treated similarly.) Moreover, the point 
x belongs to I x , by hypothesis. It follows that I x =■ (a, b), and hence by 
(2) that 4 = (a, b). Thus I x is itself an open interval, as asserted, in fact 
the open interval (a, b ). 

By its very construction, the interval (a, b) is contained in G and is 
not a subset of a larger interval contained in G. Moreover, it is clear 
that two intervals I x and 4- corresponding to distinct points x and x 
either coincide or else are disjoint (otherwise I x and I x . would both be 
contained in a larger interval I x U I x . — l <= G. There are no more than 
countably many such pairwise disjoint intervals I x . In fact, choosing an 
arbitrary rational point in each I x , we establish a one-to-one correspond¬ 
ence between the intervals I x and a subset of the rational numbers. 
Finally, it is obvious that 

G=\J I x . I 

X 

Corollary. Every closed set on the real line can be obtained by deleting 
a finite or countable system of pairwise disjoint intervals from the line. 


5 The infinite intervals ( — oo, oo), (a, co), and ( — co, b) are regarded as open. 

6 Given a set of real numbers E, inf E denotes the greatest lower bound or infimum 
of E, while sup E denotes the least upper bound or supremum of E. 
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Proof. An immediate consequence of Theorems 4 and 6. § 

Example 1. Every closed interval [a, b] is a closed set (here a and b are 
necessarily finite). 

Example 2. Every single-element set {x 0 } is closed. 

Example 3. The union of a finite number of closed intervals and single¬ 
element sets is a closed set. 

Example 4 (The Cantor set). A more interesting example of a closed set 
on the line can be constructed as follows: Delete the open interval (£, f) 
from the closed interval F 0 — [0, 1], and let F 1 denote the remaining closed 
set, consisting of two closed intervals. Then delete the open intervals 
(i> f) an d (i> I) from F 1 , and let F 2 denote the remaining closed set, con¬ 
sisting of four closed intervals. Then delete the “middle third” from each 
of these four intervals, getting a new closed set F 3 , and so on (see Figure 9). 
Continuing this process indefinitely, we get a sequence of closed sets F such 
that 

F 0 => F x => p 2 => • • • = F n = • • • 

(such a sequence is said to be decreasing). The intersection 

CO 

n =0 

of all these sets is called the Cantor set. Clearly F is closed, by Theorem 3, 
and is obtained from the unit interval [0, 1] by deleting a countable number 
of open intervals. In fact, at the nth stage of the construction, we delete 
2”- 1 intervals, each of length 1/3”. 

To describe the structure of the set F, we first note that F contains the 
points 

0 1 1 2 i 2 £ 3. 

3) ^ 9) 9) 9) 9) * • • s (JJ 

i.e., the end points of the deleted intervals (together with the points 0 and 1). 



Figure 9 
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However T contains many other points. In fact, given any x £ [0, 1], suppose 
we write x in ternary notation, representing x as a series 


x = ^+^ + 
3 3 2 


+ —+ 
3” 


where each of the numbers a ( , <t 2 , . can only take one of the three 

values 0, 1,2. Then it is easy to see that x belongs to F if and only if x has a 
representation (4) such that none of the numbers a 1 , tz 2 , . equals 

1 (think things through). 7 

Remarkably enough, the set F has the power of the continuum, i.e., 
there are as many points in F as in the whole interval [0,1], despite the fact 
that the sum of the lengths of the deleted intervals equals 

i+l + 2T + ' - '=l- 


To see this, we associate a new point 



+ ?i + 
2 2 


+ h + 

2” 


with each point (4), where 8 

f° 'f «» = 0, 

u n \ 

U if «» = 2. 

In this way, we set up a one-to-one correspondence between Tand the whole 
interval [0, 1]. It follows that F has the power of the continuum, as asserted. 
Let A 1 be the set of points (3). ThenT^/^ U A 2 , where the set A t = F — A 1 
is uncountable, since A 1 is countable and F itself is not. The points of A x 
are often called “points (of F) of the first kind,” while those of A 2 are called 
“points of the second kind.” 


Problem 1. Give an example of a metric space R and two open spheres 
S(x, /q) and S(y, r 2 ) in R such that S(x, r t ) c: S(y, r 2 ) although r 1 > r 2 . 

Problem 2. Prove that every contact point of a set M is either a limit point 
of M or an isolated point of M. 


7 Just as in the case of ordinary decimals, certain numbers can be written in two 
distinct ways. For example, 



Since none of the numerators in the second representation equals 1 the point } belongs 
to F (this is already obvious from the construction of F). 

8 If x has two representations of the form (4), then one and only one of them has no 
numerators a lt a % .equal to 1. These are the numbers used to define b n . 
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Comment. In particular, [M] can only contain points of the following 
three types: 

a) Limit points of M belonging to M ; 

b) Limit points of M which do not belong to M ; 

c) Isolated points of M. 

Thus [ M] is the union of M and the set of all its limit points. 

Problem 3. Prove that if x n -*■ x, y n ->■ y as n -> oo, then p(x n , y n ) —>■ 

p(x,y). 

Hint. Use Problem la, p. 45. 

Problem 4. Let / be a mapping of one metric space X into another metric 
space Y. Prove that / is continuous at a point x 0 if and only if the sequence 
{/„} = {/CO) converges to y = f(x 0 ) whenever the sequence {x„} con¬ 
verges to x 0 . 

Problem 5. Prove that 

a) The closure of any set M is a closed set; 

b) [M] is the smallest closed set containing M. 

Problem 6. Is the union of infinitely many closed sets necessarily closed ? 
How about the intersection of infinitely many open sets ? Give examples. 

Problem 7. Prove directly that the point \ belongs to the Cantor set F, 
although it is not an end point of any of the open intervals deleted in con¬ 
structing F. 

Hint. The point \ divides the interval [0, 1] in the ratio 1:3. It also 
divides the interval [0, |] left after deleting (|, f) in the ratio 3:1, and so on. 

Problem 8. Let F be the Cantor set. Prove that 

a) The points of the first kind, i.e., the points (3) form an everywhere 
dense subset of F; 

b) The numbers of the form t x + 4, where t u 4 e F, fill the whole interval 
[ 0 , 2 ], 

Problem 9. Given a metric space R, let A be a subset of R and a a point 
of R. Then the number 

p(A, x) — inf p(a, x) 

aeA 

is called the distance between A and x. Prove that 

a) x e A implies p (A, x ) = 0, but not conversely; 

b) p (A, x) is a continuous function of x (for fixed A); 

c) p (A, x) = 0 if and only if x is a contact point of A; 

d) [A] — A U M, where M is the set of all points x such that p(A, x) = 0. 
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Problem 10. Let A and 8 be two subsets of a metric space R. Then the 
number 

p(A, B) = inf p(a, b) 

aeA 

beB 

is called the distance between A and B. Show that p (A, B) — 0 if A n B # 0, 
but not conversely. 

Problem 11. Let M K be the set of all functions / in C [a t] satisfying a 
Lipschitz condition , i.e., the set of all/such that 

1 /( 4 ) /( 4)1 < K\t x 4 | 

for all 4 , 4 e [a, b], where K is a fixed positive number. Prove that 

a) M k is closed and in fact is the closure of the set of all differentiable 
functions on [a, b] such that \f'(t)\ < K; 

b) The set 

M = U M k 

K 

of all functions satisfying a Lipschitz condition for some K is not 
closed; 

c) The closure of M is the whole space C [a hV 

Problem 12. An open set G in ^-dimensional Euclidean space R n is said 
to be connected if any points x, y e G can be joined by a polygonal line 9 
lying entirely in G. For example, the (open) disk x a +/ 2 < 1 is connected* 
but not the union of the two disks 

x 2 +/<l, (x-2) 2 +y 2 <l 

(even though they share a contact point). An open subset of an open set G 
is called a component of G if it is connected and is not contained in a larger 
connected subset of G. Use Zorn’s lemma to prove that every open set G in 
R n is the union of no more than countably many pairwise disjoint com¬ 
ponents. 

Comment. In the case n — 1 (i.e., on the real line) every connected open 
set is an open interval, possibility one of the infinite intervals (— oo, oo), 
(a, co), (—co, b). Thus Theorem 6 on the structure of open sets on the line 
is tantamount to two assertions: 

1) Every open set on the line is the union of a finite or countable number 
of components; 

2) Every open connected set on the line is an open interval. 

9 By a polygonal line we mean a curve obtained by joining a finite number of straight 
line segments end to end. 
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The first assertion holds for open sets in R n (and in fact is susceptible to 
further generalizations), while the second assertion pertains specifically to 
the real line. # 


7 . Complete Metric Spaces 

7.1. Definitions and examples. The reader is presumably already familiar 
with the notion of the completeness of the real line. The real line is, of course, 
a particularly simple example of a metric space. We now make the natural 
generalization of the notion of completeness to the case of an arbitrary 
metric space. 

Definition 1 . A sequence {x„} ofpoints in a metric space R with metric 
p is said to satisfy the Cauchy criterion if, given any e > 0, there is an 
integer N e such that p(x„, x„.) < s for all n, n' > N e . 

Definition 2. A subsequence {x„} of points in a metric space R is called 
a Cauchy sequence {or a fundamental sequence) if it satisfies the Cauchy 
criterion. 

Theorem 1. Every convergent sequence {i n } is fundamental. 

Proof. If {x„} converges to a limit x, then, given any s > 0, there is 
an integer N. such that 

P(*»> x) < j 

for all n > N z . But then 

p(x n , x n >) p(x„, x) ~~ p(x n ., x) < £ 
for all n, n' > N z . g 

Definition 3. A metric space R is said to be complete if every Cauchy 
sequence in R converges to an element of R. Otherwise R is said to be 
incomplete. 

Example 1. Let R be the “space of isolated points” considered in Example 
1, p. 38. Then the Cauchy sequences in R are just the “stationary sequences,” 
i.e., the sequences {x n } all of whose terms are the same starting from some 
index n. Every such sequence is obviously convergent to an element of R. 
Hence R is complete. 

Example 2. The completeness of the real line R 1 is familiar from elemen¬ 
tary analysis. 
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Example 3. The completeness of Euclidean «-space R n follows from that 
of R 1 . In fact, let 


yiv) . 


(X?\ o (p = 1,2,...) 


be a fundamental sequence of points of R n . Then, given any e > 0, there 
exists an N E such that 

i(*i” - x <°>f < s 2 

k= 1 

for all p,q > JV e . it follows that 

- x]f\ < £ (/c = 1,. . . , n) 

for all p,q > N z , i.e., each {x ( f ] } is a fundamental sequence in R 1 . Let 

x = (x 1( . . . , X„), 

where 

x lc = lim x‘ s,) . 

?)-»oo 

Then obviously 


limx <3)> = x. 

co 


This proves the completeness of R n . The completeness of the spaces R” and 
R^ introduced in Examples 4 and 5, p. 39 is proved in almost the same way 
(give the details). 

Example 4. Let {x„(r)} be a Cauchy sequence in the function space C [a 6] 
considered in Example 6, p. 39. Then, given any e > 0, there is an JV C such 
that 

\ x „( t ) - x„,(0l < e (1) 


for all n, n' > N z and all t e [a, b]. It follows that the sequence {x n {t)) is 
uniformly convergent. But the limit of a uniformly convergent sequence of 
continuous functions is itself a continuous function (see Problem 1). Taking 
the limit as n' —► oo in (1), we find that 

K(0 - x(0l < e 

for all n > N e and all t e [a, b ], i.e., {x„(I)} converges in the metric of C [a b] 
to a function x(t) e C [aJ)] . Hence C [aJ>] is a complete metric space. 

Example 5. Next let x in) be a sequence in the space 4 considered in 
Example 7, p. 39, so that 

y.(n) _ / v (rc) v (n) v.(«) \ 

A — V-^i j 9 • • • > •> • • 

CO 

2(xi n) r< co (n = 1, 2 ,. . .). 

k =1 
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Suppose further that {x (n] } is a Cauchy sequence. Then, given any s > 0, 
there is a N e such that 

*;•>*•* v 00 

pV b) , *'*'’) = 2(*l n) - 4 n y <« (2) 

1 

if n, n! > A£. It follows that 


(xi n) -4 n y< s (k = i,2,...), 

i.e., for every A: the sequence {x*”’} is fundamental and hence convergent. 
Let 

x k — lim x ( k \ 

n~* oo 

X = (x x , x 2 ,. . . , x k ,. . .). 


Then, as we now show, x is itself a point of l 2 and moreover {x (n) } converges 
to x in the / 2 metric, so that / 2 is a complete metric space. 

In fact, (2) implies 

M 

2(4 n) - x[ n y < s 0) 


for any fixed M. Holding n fixed in (3) and taking the limit as ri -> oo, we get 

M 

2(4”’ - x k) 2 < e. (4) 

lc = l 

Since (4) holds for arbitrary M, we can in turn take the limit of (4) as M -*■ oo, 
obtaining 

f (4 n> - x k) 2 < £• (5) 

k= 1 

Just as on p. 40, the convergence of the two series 


2(4 n) ) 2 , I(4 n) - **) 2 

k=l k=l 


implies that of the series 


This proves that x e l 2 . Moreover, since s is arbitrarily small, (5) implies 


lim ? (x {n \ x) = lim 2( x k ~ x kf = 0, 

n-+ oo n-* oo V Jc =1 

i.e., {x M } converges to x in the l 2 metric, as asserted. 
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Example 6. It is easy to show that the space C* a 6] of Example 8, p. 40 is 
incomplete. If 

(-1 if —1 <*<—-, 


9.(0= I nt if 


1 if - < t < 1, 
n 


then {<p n (0} is a fundamental sequence in C*_ l xy since 


£,[9.(0- 


min { n , n'} 


However, {cp„(l)} cannot converge to a function in C*_ 1(1J . In fact, consider 
the discontinuous function 

f-1 if 7 < 0, 


<K0 = 


1 if t > 0. 


Then, given any function feC{_ l lv it follows from Schwarz’s inequality 
(obviously still valid for piecewise continuous functions) that 

(£j/(o - m 2 ) U2 < (S 1 jf(t)-9n(t)fdt) m + (£,[9.(0- «ko]**) w ! 

But the integral on the left is nonzero, by the continuity of/, and moreover 
it is clear that 


Therefore 


lim £>„(0-<K0] 2 ^ = 0. 

n~+ oo 

/£[/«- 9.(01* * 


cannot converge to zero as n -*■ oo. 

7.2. The nested sphere theorem. A sequence of closed spheres 
S[x u rj, S[x 2 , r 2 ],... , S[x n , r„],... 
in a metric space R is said to be nested (or decreasing) if 

S[X U Tj] —’ >5 [x 2 , r 2 ] 73 xs 5[x„, r n ) ■ . 

Using this concept, we can prove a simple criterion for the completeness of R: 
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Theorem 2 ( Nested sphere theorem). A metric space R is complete if 
and only if every nested sequence {S n } — {S[x„, r n ]} of closed spheres in 
R such that r. n —*• 0 as n —*■ op has a nonempty intersection 

00 

ns„. 

n =1 

Proof. If R is complete and if {S n } — {S[x n , r n ]} is any nested se¬ 
quence of closed spheres in R such that /•„-> 0 as n—* oo, then the 
sequence {rj of centers of the spheres is fundamental, since p(x„, x„.) < 
r n for n' > n and r n -> 0 as « - > oo. Therefore {x„} has a limit. Let 

x = lim x„. 

n~* oo 

Then 

00 

x e fl S„. 

n=* 1 

In fact, S n contains every point of the sequence {x„} except possibly the 
points Xj, x 2 , . . . , x n _ 1( and hence x is a limit point of every sphere S„. 
But S n is closed, and hence x e S n for all n. 

Conversely, suppose every nested sequence of closed spheres in R 
with radii converging to zero has a nonempty intersection, and let {x„} 
be any fundamental sequence in R. Then x has a limit in R. To see this, 
use the fact that {x n } is fundamental to choose a term x Ml of the sequence 
{x„} such that 

P(*n. X n) < ~ 

for all n > n x , and let S x be the closed sphere of radius 1 with center x % . 
Then choose a term x„ a of {x B } such that n 2 > n x and 

P(*»» 

for all n > n 2 , and let S 2 be the closed sphere of radius J with center x„ 2 . 
Continue this construction indefinitely, i.e., once having chosen terms 
x„,, x„. x n («! < « 2 < ‘< «*), choose a term x„ such that 

12 k *X-rl 

n k+1 > n k and 

?( X n> X n k+ 1 ) < 2 * 4-1 

for all n > n k+1 , let S k+1 be the closed sphere of radius l/2 & with center 
x„ , and so on. This gives a nested sequence {>S' TC } of closed spheres 
with radii converging to zero. By hypothesis, these spheres have a non¬ 
empty intersection, i.e., there is a point x in all the spheres. This point 
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is obviously the limit of the sequence {x n f. But if a fundamental se¬ 
quence contains a subsequence converging to x, then the sequence itself 
must converge to x (why?), i.e., 

limx„ = x. 1 

n~* co 

7.3. Baire’s theorem. It will be recalled from Sec. 6.3 that a subset A of a 
metric space R is said to be nowhere dense in R if it is dense in no (open) 
sphere at all, or equivalently, if every sphere S c R contains another sphere 
S' such that S' n A = 0 (check the equivalence). This concept plays an 
important role in 

Theorem 3 (Baire). A complete metric space R cannot be represented 
as the union of a countable number of nowhere dense sets. 

Proof. Suppose to the contrary that 

R = 0 A n , ( 6 ) 

n~ 1 

where every set A n is nowhere dense in R. Let S 0 <= R be a closed sphere 
of radius 1. Since A x is nowhere dense in S 0 , being nowhere dense in R, 
there is a closed sphere S x of radius less than £ such that S x <= s 0 and 
^nl,= 0 . Since A 2 is nowhere dense in S u being nowhere dense 
in S 0 , there is a closed sphere S 2 of radius less than £ such that S 2 <= S x 
and S 2 n A 2 = 0, and so on. In this way, we get a nested sequence of 
closed spheres {SJ with radii converging to zero such that 

S n HA n = 0 (n = 1,2,...). 

By the nested sphere theorem, the intersection 

CO 

ns„ 

77 = 1 

contains a point x. By construction, x cannot belong to any of the 
sets A n , i.e., 

CO 

x £ U A„. 

71—1 

It follows that 

CO 

R # U A„ 

71=1 

contrary to (6). Hence the representation (6) is impossible. 1 

Corollary. A complete metric space R without isolated points is 
uncountable. 

Proof. Every single-element set {x} is nowhere dense in R. | 
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7.4. Completion of a metric space. As we now show, an incomplete metric 
space can always be enlarged (in an essentially unique way) to give a complete 
metric space. ,0 

Definition 4. Given a metric space R with closure [/?], a complete 
metric space R* is called a completion of R if R <= R* and [7?] = R*, 
i.e., if R is a subset of R* everywhere dense in R*. 

Example 1. Clearly R* = R if R is already complete (see Problem 7). 

Example 2. The space of all real numbers is the completion of the space 
of all rational numbers. 

Theorem 4. Every metric space R has a completion. This completion 
is unique to within an isometric mapping carrying every point x e R into 
itself. 

Proof. The proof is somewhat lengthy, but completely straight¬ 
forward. First we prove the uniqueness , showing that iff?* and R** 
are two completions of R, then there is a one-to-one mapping x** = 
<p(x*) of R* onto R** such that o(x) = x for all xe R and 
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{x n } in R are equivalent and write {x m } ~ {x n } if 
lim p(x„, x n ) = 0. 

Hl~* CO 

As anticipated by the notation and terminology, ~ is reflexive, sym¬ 
metric and transitive, i.e., ~ is an equivalence relation in the sense of 
Sec. 1.4. Therefore the set of all Cauchy sequences of points in the space 
R can be partitioned into classes of equivalent sequences. Let these 
classes be the points of a new space R*. Then we define the distance 
between two arbitrary points x*, y* e R* by the formula 

Pi(x*,y*) = lim p(x„,y„), (9) 

n~* oo 

where {x n } is any “representative” of x* (namely, any Cauchy sequence 
in the class x*) and is any representative of y*. 

The next step is to verify that (9) is in fact a distance, i.e., that (9) 
exists, is independent of the choice of the sequences {x M } e x*, {y„} ey*, 
and satisfies the three properties of a distance figuring in Definition 1, 
p. 37. Given any s > 0, it follows from the triangle inequality in R 
(recall Problem lb, p. 45) that 


Pi(x*,y*) = p 2 (x**,y**) (7) 

(y** = <p(y*)), where p x is the distance in R* and p 3 the distance in R**. 
The required mapping <p is constructed as follows: Let x* be an arbitrary 
point of R*. Then, by the definition of a completion, there is a sequence 
{x n } of points of R converging to x*. The points of the sequence {x n } 
also belong to R**, where they form a fundamental sequence (why?). 
Therefore {x n } converges to a point x** e R**, since R** is complete. 
It is clear that x** is independent of the choice of the sequence {x n } 
converging to the point x* (why?). If we set <p(x*) = x**, then cp is 
the required mapping. In fact, <p(x) = a: for all x e R, since if x n -> x 
6 7?, then obviously x = x*eR*,x** = a. Moreover, suppose x n -+ x*, 
y n -+y* in R*, while x„-^x**, y n ->y** in R**. Then, if p is the 
distance in R, 

Pi(a*, y*) = lim p^x,,, y n ) = lim p(x„, y n ) (8) 

n~* co n~* co 

(see Problem 3, p. 54), while at the same time 

p 2 (x**, y**) = lim p 3 (x„, yj = lim p(x M , yj. (8') 

n~* oo n~* oo 

But (8) and (8') together imply (7). 

We must now prove the existence of a completion of R. Given an 
arbitrary metric space R, we say that two Cauchy sequences (aJ and 


Ip (x„,y n )- p(x„', y„d I 

= I p(x„, y„) - ?(x n ., y n ) + p(x„-, y n ) - p(x„-, y„,)\ 

< IpO k > y«) - p(x n ; y n ) I + |p(x m ., y„) - p(x„., y n .)\ 

< p(x„, X„.) + p (y n , y n .) < ^ | = E (10) 


for all sufficiently large n and n'. Therefore the sequence of real numbers 
{jJ = {p(x n ,y n )} is fundamental and hence has a limit. This limit is 
independent of the choice {x n } e a*, {yj ey*. In fact, suppose 


Then 


W, {x n } ex*, {y n }, {y n } ey*. 
Ip(-^wsTn) ' p(-^MsLn)! ^ P (.Xn, Xf) -j- p(y n , L n ), 


by a calculation analogous to (10). But 


lim p(x„, xj = lim p (y n> y n ) = 0, 

n~* co n~* co 

since {x n } ~ {x„}, {/„} ~ {y n }, and hence 

lim p(x„, y„) = lim p(x„, yj. 

n~* oo n-*co 

As for the three properties of a metric, it is obvious that pfx*, y*) = 
pi(y*, x*), and the fact that pi(x*, y*) = 0 if and only if x* = y* is an 
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immediate consequence of the definition of equivalent Cauchy sequences. 
To verify the triangle inequality in R *, we start from the triangle inequality 

P("Ci > z n) ^ T«) "h pC^nJ ^n) 

in the original space R and then take the limit as n —> oo, obtaining 
lim P (x„, z„) < lim p(x n , y n ) + lim p (y n , z n ), 

n~> oo n~* oo n~* co 

i.e., 

PiO*, z*) < >-*) + P i(y*, z*). 

We now come to the crucial step of showing that R* is a completion 
of R. Suppose that with every point x e R, we associate the class x* £ R* 
of all Cauchy sequences converging to x. Let 

x — lim x n , y — lim y n . 

n-+ oo n~* co 

Then clearly 

p(x, y) = lim P (x„, y n ) 

n~* oo 

(recall Problem 3, p. 54), while on the other hand 
Pl (x*, y*) = lim P (x n , y n ), 

n~* oo 

by definition. Therefore 

p(x,y) = pi(**,j*), 

and hence the mapping of R into R* carrying x into x* is isometric. 
Accordingly, we need no longer distinguish between the original space R 
and its image in R*, in particular between the two metrics p and Pl 
(recall the relevant comments on p. 44). In other words, R can be re¬ 
garded as a subset of R*. The theorem will be proved once we succeed 
in showing that 

1) R is everywhere dense in R*, i.e., [I?] = R; 

2) R* is complete. 

To this end, given any point x* £ R* and any e > 0, choose a rep¬ 
resentative of x*, namely a Cauchy sequence {x n } in the class x*. Let 
N be such that p(x„, x„.) < s for all n, n' > N. Then 

?(.x n , x*) = lim p(x„, x n .) < e 

n'-> oo 

if n > N, i.e., every neighborhood of the point x* contains a point of R. 
It follows that [I?] = R. 

Finally, to show that R* is complete, we first note that by the very 
definition of R*, any Cauchy sequence {xj consisting of points in R 
converges to some point in R*, namely to the point x* 6 R* defined by 
{x„}. Moreover, since R is dense in R*, given any Cauchy sequence 
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{x*} consisting of points in R*, we can find an equivalent sequence {x„} 
consisting of points in R. In fact, we need only choose x n to be any 
point of R such that p(x„, x*) < 1 /«. The resulting sequence {x n } is 
fundamental, and, as just shown, converges to a point x* £ R*. But then 
the sequence {x*} also converges to x*. | 


Example. If R is the space of all rational numbers, then R* is the space of 
all real numbers, both equipped with the distance p(x, y) = \x — y\. In this 
way, we can “construct the real number system.” However, there still 
remains the problem of suitably defining sums and products of real numbers 
and verifying that the usual axioms of arithmetic are satisfied. 

Problem 1. Prove that the limit/(t) of a uniformly convergent sequence 
of functions {/„(i)} continuous on [a, b] is itself a function continuous on 
[a,b]. 


Hint. Clearly 

i/w -m\ < i/w -m\ + i/,(o -/„(*«)i + i/»('o) -/a 0 )i, 


where t, t 0 e [a, b]. Use the uniform convergence to make the sum of the 
first and third terms on the right small for sufficiently large n. Then use the 
continuity of f n {t) to make the second term small for t sufficiently close to t 0 . 

Problem 2. Prove that the space m in Example 9, p. 41 is complete. 

CO 

Problem 3. Prove that if R is complete, then the intersection f| 
figuring in Theorem 2 consists of a single point. M=1 

Problem 4. By the diameter of a subset A of a metric space R is meant the 
number 


d(A) = sup P (x, y). 

x.yeA 


Suppose R is complete, and let { A n } be a sequence of closed subsets of R 
nested in the sense that 

A t => A a =>•■■=> A„=> •••. 

Suppose further that 

lim d(A„) = 0. 

n~* co 
co 

Prove that the intersection f) A n is nonempty. 

n —1 


Problem 5. A subset A of a metric space R is said to be bounded if its 
diameter d(A) is finite. Prove that the union of a finite number of bounded 
sets is bounded. 
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Problem 6. Give an example of a complete metric space R and a nested 
sequence {A n } of closed subsets of R such that 

00 

f)A n = 0. 

n~l 

Reconcile this example with Problem 4. 

Problem 7. Prove that a subspace of a complete metric space R is com¬ 
plete if and only if it is closed. 

Problem 8. Prove that the real line equipped with the distance 
p(x, y) = |arc tan x — arc tan 
is an incomplete metric space. 

Problem 9. Give an example of a complete metric space homeomorphic 
to an incomplete metric space. 

Hint. Consider the example on p. 44. 

Comment. Thus homeomorphic metric spaces can have different “metric 
properties.” 

Problem 10. Carry out the program discussed in the last sentence of the 
example on p. 65. 

Hint. If {x„} and {y n ) are Cauchy sequences of rational numbers serving 
as “representatives” of real numbers x* and y*, respectively, define x* + y* 
as the real number with representative { x„ + y„}- 

8. Contraction Mappings 

8.1. Definition of a contraction mapping. The fixed point theorem. Let A 
be a mapping of a metric space R into itself. Then x is called a fixed point 
of A if Ax = x, i.e., if A carries x into itself. Suppose there exists a number 
a < 1 such that 

p (Ax, Ay) < ccp(x,y) (1) 

for every pair of points x,y e R. Then A is said to be a contraction mapping. 
Every contraction mapping is automatically continuous, since it follows from 
the “contraction condition” (1) that Ax„ —*■ Ax whenever x n -*■ x. 

Theorem 1 ( Fixed point theorem 10 ). Every contraction mapping A 
defined on a complete metric space R has a unique fixed point. 


10 Often called the method of successive approximations (see the remark following 
Theorem 1) or the principle of contraction mappings. 


SEC. 8 

Proof. Given an arbitrary point x 0 R, let 11 

x x = Ax 0 , x 2 = Ax x = A 2 x 0 ,. .. , x n = Ax n _i = A n x 0 , ... (2) 

Then the sequence {x n } is fundamental. In fact, assuming to be explicit 
that n < n , we have 

?(x n , x n .) = p (A n x 0 , A n 'x 0 ) < a"p(x 0 , 

< a"[p(x 0 , xf) + pCxj, x 2 ) + • • • + p(x„._„_ 1 , 

< a m p(x 0 , Xj.)[l + a + a 2 + • • • + < a M p(x 0 , Xj) —-— . 

1 — a 

But the expression on the right can be made arbitrarily small for suffi¬ 
ciently large n, since a < 1. Since R is complete, the sequence {x„}, 
being fundamental, has a limit 

x = lim x n . 

n-* co 

Then, by the continuity of A, 


Ax = A lim x n = lim ,4x„ == lim x m+1 = x. 

n~* oo n~* oo n~* oo 

This proves the existence of a fixed point x. To prove the uniqueness of x, 
we note that if 

Ax — x, Ay = y, 

(1) becomes 


p(x, j) < ap(x,_y). 

But then p(x,y) = 0 since a < 1, and hence x = y. 1 


Remark. The fixed point theorem can be used to prove existence and 
uniqueness theorems for solutions of equations of various types. Besides 
showing that an equation of the form .4x = x has a unique solution, the 
fixed point theorem also gives a practical method for finding the solution, i.e., 
calculation of the “successive approximations” (2). In fact, as shown in 
the proof, the approximations (2) actually converge to the solution of the 
equation Ax = x. For this reason, the fixed point theorem is often called 
the method of successive approximations. 


Example 1. Let / be a function defined on the closed interval [a, b ] which 
which maps [a, b] into itself and satisfies a Lipschitz condition 

\f( x i) -/Oa)! < K l*i “ x 2 \, (3) 

with constant K < 1. Then / is a contraction mapping, and hence, by 


11 A i x means A(Ax), A s x means A(A 2 x) = A-(Ax), and so on. 
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Theorem 1, the sequence 


*o. =/Oo). *2=/(*i),... (4) 

converges to the unique root of the equation /(x) = x. In particular, the 
“contraction condition” (3) holds if/has a continuous derivative/' on [a, b] 
such that 

l/'W| <K< 1. 

The behavior of the successive approximations (4) in the cases 0 < /'(x) < 1 
and —1 < f'{x) < 0 is shown in Figures 10 and 11. 

Example 2. Consider the mapping A of «-dimensional space into itself 
given by the system of linear equations 

n 

Ti = 2 a ii x i + b i (* = 1. n). (5) 

i=l 
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If A is a contraction mapping, we can use the method of successive approxi¬ 
mations to solve the equation Ax = x. The conditions under which A is a 
contraction mapping depend on the choice of metric. We now examine three 
cases: 

1) The space R% with metric 


p(x, y) = max \x t - y { \. 

In this case, 


p(t. t) = max |y ; — /| = max 

i i 

2 a d x i 

3 

— x 3 ) 

< max 2 kii 1 x i — Xj\ 

i 3 



< max 2 kil max |x., - 

i 3 3 

- x i\ = ^max 2 kil j p(x, x), 

and the contraction condition 



2 \a„\ < a < 1 

(i = 1, . . 

., n). (6) 


2) The space I?" with metric 

n 

p(*> y) = 21 x i - Til- 

i= 1 

Here 

p(t, t) = 2It* - Til = 2 2 a a( x s - x i) 

i i j 

<22 k#l \ x i - x i\ < (max 2 |a„| Jp(x, x), 
i i \ 2 i J 

and the contraction condition is now 

2 kil < a < 1 (j = 1, . . . , n). (7) 

t 

3) Ordinary Euclidean space R n with metric 

p(x, y) = 

Using the Cauchy-Schwarz inequality, we have 

P 2 0, JO = 2 (2 a a( x i - W)) < (22 al^jpXx, x), 

and the contraction condition becomes 

224 < *< 1 - 



( 8 ) 
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Thus, if at least one of the conditions (6)-(8) holds, there exists a unique 
point x = [x 1 , x 2 ,. . . , x n ) such that 

Xi = J. a *i x 3 + b i (i = 1, ■ ■ ■, n). (9) 

i=i 

The sequence of successive approximations to this solution of the equation 
x = Ax are of the form 


-<0) / v (0) v (0) 

, - V^l » *^2 9 • * 

x (0> ) 

* 9 X n J, 

.(i)_/ V (D vd) 

' — 1*1 s *2 9 • • 

X iU ) 

• 9 n h 

.( k ) _ /„(*) v (fc) 

— \x 1 , A 2 , . . 

v (fc)\ 

• 9 ’A-n )9 


where 

xj k> -J,a ij xf~ 1) + b t , 

j -i 

and we can choose any point x (0) as the “zeroth approximation.” 

Each of the conditions (6)-(8) is sufficient for applicability of the method 
of successive approximations, but none of them is necessary. In fact, examples 
can be constructed in which each of the conditions (6)-(8) is satisfied, but 
not the other two. 

Theorem 1 has the following useful generalization, which will be needed 
later (see Example 2, p. 75): 

Theorem 1'. Given a continuous mapping of a complete metric space R 
into itself, suppose A n is a contraction mapping (n an integer > 1). Then 
A has a unique fixed point. 

Proof. Choosing any point x 0 e R, let 
x = lim A kn x 0 . 

k~* oo 

Then, by the continuity of A, 

Ax = lim AA kn x 0 . 

7\'->CC 

But A n is a contraction mapping, and hence 
p (A kn Ax 0 , A kn x 0 ) -< ap(A (k - 1)n Ax 0 , A (k ~ 1>n x 0 ) < • • • < oc k p(Ax„, x 0 ) 
where a < 1. It follows that 

p (Ax, x ) = lim p[AA kn x 0 , A kn xf) = 0, 

k-* oo 

i.e., Ax = x so that x is a fixed point of A. To prove the uniqueness of x. 


we merely note that if A has more than one fixed point, then so does A n , 
which is impossible, by Theorem 1, since A n is a contraction 
mapping. I 


8.2. Contraction mappings and differentia] equations. The most interesting 
applications of Theorems 1 and T arise when the space R is a function 
space. We can then use these theorems to prove a number of existence and 
uniqueness theorems for differential and integral equations, as shown in this 
section and the next. 


Theorem 2 [Picard). Given a function f(x,y) defined and continuous 
on a plane domain G containing the point (x 0 ,y 0 ), 12 suppose f satisfies a 
Lipschitz condition of the form 


I f(x,y) -f(x,y) | <M\y-y\ 


in the variable y. Then there is an interval \x — x 0 | 
differential equation 


dy 

dx 


: f{x,y) 


S in which the 


( 10 ) 


has a unique solution 
satisfying the initial condition 


y = ?(*) 

<p(*o) = To- 


( 11 ) 


Proof. Together the differential equation (10) and the initial condition 
(11) are equivalent to the integral equation 

f 

?(*) = To + / fit, <p(0) dt. (12) 

‘ By the continuity of f, we have 

\f{x,y)\ < K (13) 

in some domain G’ <= G containing the point (x 0 , jo)- 13 Choose S > 0 

such that 

I 

! 1) (x, y) e G’ if \x — x 0 \ < S, |_y — y 0 \ < K8; 

:> 2) MS < 1, 

I 

i and let C* be the space of continuous functions cp defined on the interval 


12 By an ^-dimensional domain we mean an open connected set in Euclidean «-space 
R n (connectedness is defined in Problem 12, p. 55). 

13 In fact,/is bounded on [G'] if [G'] c G (cf. Theorem 2, p. 110). 
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\x — x 0 \ < S and such that |tp(x) — y 0 | < KB, equipped with the metric 
p(9>. 9 ) = max |cp(x) - ^(x)|. 

■'* X 

The space C* is complete, since it is a closed subspace of the space of all 
continuous functions on [x 0 — S, x 0 + &]. Consider the mapping <{; = 
A 9 defined by the integral equation 


. fit, 9(0)< 


x„\ < §)■ 


Clearly A is a contraction mapping carrying C* into itself. In fact, if 
9 e C*, \x — x 0 | < 8 then 

|+(x) — J’ol = | j Xo 9(0) dt\ < j x \f(t, 9(0)1 dt < K\x — x 0 | < KB 


by (13), and hence ^ = A 9 also belongs to C*. Moreover, 

|+(*) - $(x)| < Jj/(f, fit))—f(t, 9 ( 0)1 dt < MS max | 9 (x) - $(x)|, 
and hence 

pOK $) < MSp( 9 , 9 ) 

after maximizing with respect to x. But MS < 1 , so that A is a con¬ 
traction mapping. It follows from Theorem 1 that the equation 9 = T 9 , 
i.e., the integral equation (12), has a unique solution in the space C*. @ 

Theorem 2 can easily be generalized to the case of systems of differential 
equations: 

Theorem 2'. Given n functions f(x, y u ..., y n ) defined and continuous 
on an (n + 1 ydimensional domain G containing the point 

i x o, Ton ■ ■ ■ > To»)> 

suppose each fi satisfies a Lipschitz condition of the form 

I fix, yj -fix, jfi, • • •, JO I < M max | y { - y t \ 

in the variables y lt . . . , y n . Then there is an interval \x — x 0 | < S in which 
the system of differential equations 

=/<(*> Ti> jO (i = 1,. . ., n) (14) 

ax 

has a unique solution 

yi = 9lW> •••>/*= 9 nix) 
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satisfying the initial conditions 

9i( x o) = Jon • • - > 9n(*o) = Ton- (15) 

Proof. Together the differential equations (14) and the initial con¬ 
ditions (15) are equivalent to the system of integral equations 

(•X 

9 i(x) = y 0 i + J x /iit, fiiO, 9 nit)) dt (i = 1,. . ., n). (16) 

By the continuity of the functions f { , we have 

\fiix,y u ... ,y n ) | < K (i = l,... ,n) (17) 

in some domain G' <= G containing the point (x 0 , y nl . y 0n ). Choose 

S > 0 such that 

1) (x, y 1 ,...,y n )eG' if |x — x 0 | < S, | y { - y 0i \ < KB for all i = 
1 ,... 

2) MS < 1. 

This time let C* be the space of ordered n-tuples 

9 = (9i.9n) 

of continuous functions <p 1( ... , <p„ defined on the interval |x — x 0 | < S 

such that | 9 *(x) —ToJ < KA for all i — 1 . n, equipped with the 

metric 

p(9, 9 ) = max 19i(x) - 9 /x)|. 

x.i 

Clearly C* is complete. Moreover, the mapping <]; = T 9 defined by the 
system of integral equations / 


rX 

y«< + ■ 


Qx - x fl | < S, i = 1 ,... 


is a contraction mapping carrying C* into ;/ 

9 = (9i.9n) e C* 

then 

IW*) - Toil = | / fit, flit). 


by (17), so that t|t = (+ l . 

IW X ) - &(*)! = / 


A $ . 
^ 0#O S 
y-v. A 


ease of (18) by extending the 


o' if y > x. 
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and hence 

P(<K $) < ^Sp(<p, 9 ) 

after maximizing with respect to x and i. But MS < 1, so that A is a 
contraction mapping. It follows from Theorem 1 that the equation 
<p = A 9 , i.e., the system of integral equations (16), has a unique solution 
in the space C*. 1 

8.3. Contraction mappings and integral equations. We now show how the 
method of successive approximations can be used to prove the existence and 
uniqueness of solutions of integral equations. 


Example 1. By a Fredholm equation (of the second kind) is meant an 
integral equation of the form 

/(x) = X f K(x, y)f(y ) dy + <p(x), (18) 

* a 

involving two given functions K and 9 , an unknown function f and an 
arbitrary parameter X. The function K is called the kernel of the equation, 
and the equation is said to be homogeneous if 9 = 0 (but otherwise non- 
homogeneous ). 

Suppose K(x,y) and 9 (x) are continuous on the square a < x < b, 
a < y < b, so that in particular 

\K(x,y)\ < M (a < x < b, a < y < b). 

Consider the mapping g = Af of the complete metric space C [a ()] into itself 
given by 

g(x) = X J\(x, y)f(y) dy + 9 (x). 

Clearly, if g 1 = Af u g 2 = Af, 2 , then 



It follows from Theorem 1 that the integral equation (18) has a unique 
solution for any value of X satisfying (19). The successive approximations 
U / 1 , • ■ ■ ... to this solution are given by 


fn( x ) = K(x, y)f n .fy) dy + ? (x) (n = 


1 , 2 ,...), 
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where any function continuous on [a, b] can be chosen as/„. Note that the 
method of successive approximations can be applied to the equation (18) 
only for sufficiently small |X|. 

Example 2. Next consider the Volterra equation 

fix) = \f a K(x, y)f(y) dy + <p(x), (20) 

which differs from the Fredholm equation (18) by having the variable x 
rather than the fixed number b as the upper limit of integration . 14 It is easy 
to see that the method of successive approximations can be applied to the 
Volterra equation (20) for arbitrary X, not just for sufficiently small [X| as 
in the case of the Fredholm equation (18). In fact, let A be the mapping 
of C [n hl into itself defined by 

Af(x) = x£ K(x, y)f(y) dy + <p(x), 
and let f u f 2 e C la by Then 

Wi(x) - Affx )| = X£ K(x, y)[ffy) - ffy)] dy 


< XM(x — a) max | ffx) — ffx )|, 


M = max | K{x, y)|. 


It follows that 


\A 2 f(x) — A%(x)\ < X 2 M 2 max |/,(x) — / 2 (x)| / (x — a) dx 


>»»,.»(* ~ a )\ 


X M --2- max | ffx) -/ 2 (x)|, 

L x 


and in general. 


which implies 


\AJfx) - AJ 2 (x)\ < X™M" ^max | ffx) - / 2 (x)| 

n\ x 

< * n M n {b ~ a)n max |/ x (x) -/ 2 (x)|, 
n\ x 


PiA%, A%) < \ n M n ( - b - . ar 9 {f,f 2 ). 

n\ 


14 Equation (20) can be regarded formally as a special case of (18) by extending the 
definition of the kernel, i.e., by setting 

K(x, y) = 0 if y > x. 
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But, given any X, we can always choose n large enough to make 

tI"M" ^ ~ a)n < 1, 

n ! 

i.e., A n is a contraction mapping for sufficiently large n. It follows from 
Theorem 1' that the integral equation (20) has a unique solution for arbitrary X. 

Problem 1. Let A be a mapping of a metric space R into itself. Prove that 
the condition 

P (Ax, Ay) < p(x, y) (x y) 
is insufficient for the existence of a fixed point of A. 

Problem 2. Let F(x) be a continuously differentiable function defined on 
the interval [a, b] such that F(a) < 0, F(b) > 0 and 

0 < ^ < F'(x ) < K 2 (a < x < b). 

Use Theorem 1 to find the unique root of the equation F{x) = 0. 

Hint. Introduce the auxiliary function /(x) = x — XF(x), and choose X 
such that the theorem works for the equivalent equation f(x) = x. 

Problem 3. Devise a proof of the implicit function theorem based on the 
use of the fixed point theorem. 15 

Problem 4. Prove that the method of successive approximations can be 
used to solve the system (9) if |««| < l/n (for all i and/), but not if |a w | = 1 /«. 

Problem 5. Prove that the condition (6) is necessary for the mapping (5) 
to be a contraction mapping in the space Rg. 

Problem 6. Prove that any of the conditions (6)-(8) implies 


flll 1 



a 21 

a 22 1 


a nl 

«„2 

®nn 


Comment. Hence the fact that the system (5) has a unique solution (under 
suitable conditions) follows from Cramer’s rule as well as from the fixed 
point theorem. 


15 See e.g., t. G. Petrovski, Ordinary Differential Equations (translated by R. A. Silver- 
man), Prentice-Hall, Inc., Englewood Cliffs, N.J. (1966), p. 47. 
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Problem 7. Consider the nonlinear integral equation 

f{x) = x/ o K(x, y ; f(y)) dy + <p(x) (21) 

with continuous K and cp, where K satisfies a Lipschitz condition of the form 
I K(x,y; Zj) - K(x,y; z 2 )| < M |z t — z a | 
in its “functional” argument. Prove that (21) has a unique solution for all 


M < 


l 

M(b — a) 


Write the successive approximations to this solution. 
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TOPOLOGICAL SPACES 


9. Basic Concepts 

9.1. Definitions and examples. In our study of metric spaces, we defined 
a number of key ideas like contact point, limit point, closure of a set, etc. 
In each case, the definition rests on the notion of a neighborhood, or, what 
amounts to the same thing, the notion of an open set. These notions (neigh¬ 
borhood and open set) were in turn defined by using the metric (or distance) 
in the given space. However, instead of introducing a metric in a given set 
X, we can go about things differently, by specifying a system of open sets 
in X with suitable properties. This approach leads to the notion of a topo¬ 
logical space. Metric spaces are topological spaces of a rather special 
(although very important) kind. 

Definition 1. Given a set X, by a topology in X is meant a system t of 
subsets G <= X, called open sets (relative to -r), with the following two 
properties: 

1) The set X itself and the empty set 0 belong to t; 

2) Arbitrary {finite or infinite) unions U C* an d finite intersections 

n a 

f) G k of open sets belong to t. 

»-i 

Definition 2. By a topological space is meant a pair (X, r), consisting 
of a set X and a topology t defined in X. 

Just as a metric space is a pair consisting of a set X and a metric defined in 
X, so a topological space is a pair consisting of a set X and a topology defined 
in X. Thus, to specify a topological space, we must specify both a set X and 

78 


a topology in X, i.e., we must indicate which subsets of X are to be regarded 
as “open (in X).” Clearly, we can equip one and the same set with various 
different topologies, thereby defining various different topological spaces. 
Nevertheless, we will usually denote a topological space, namely a pair {X, t), 
by a single letter like T. Just as in the case of a metric space R, the elements 
of a topological space T will be called the points of T. 

By the closed sets of a topological space T, we mean the complements 
T — G of the open sets G of T. It follows from Definition 1 and the “duality 
principle” (see p. 4) that 

T) The space T itself and the empty set 0 are closed ; n 

2') Arbitrary {finite or infinite) intersections f) F a and finite unions (J F k 
of closed sets of T are closed. * ft “ 1 

The natural way of introducing the concepts of neighborhood, contact 
point, limit point and closure of a set is now apparent: 

a) By a neighborhood of a point x in a. topological space T is meant any 
open set G ■= T containing x; 

b) A point x e T is called a contact point of a set M T if every neigh¬ 
borhood of x contains at least one point of M ; 

c) A point v: e T is called a limit point of a set M <= T if every neighbor¬ 
hood of x contains infinitely many points of M; 

d) The set of all contact points of a set M <= T is called the closure of 
M, denoted by [M]. 

Example 1. According to Theorem 5, p. 50, the open sets in any metric 
space satisfy the two properties in Definition 1. Hence every metric space 
is a topological space as well. 

Example 2. Given any set T, suppose we regard every subset of T as open. 
Then T is a topological space (the properties in Definition 1 are obviously 
satisfied). In particular, every set M <= T is both open and closed, and every 
set M ^ T coincides with its own closure. Note that the “discrete metric 
space” of Example 1, p. 38 has this trivial topology. 

Example 3. As another extreme case, consider an arbitrary set T equipped 
with a topology consisting of just two sets, the whole set T and the empty 
set 0. Then T is a topological space, a kind of “space of coalesced points” 
(mainly of academic interest). Note that the closure of every nonempty set 
is the whole space T. 

Example 4. Let T be the set {a, b }, consisting of just two points a and b, 
and let the open sets in The T itself, the empty set and the single-element set 
{b}. Then the two properties in Definition 1 are satisfied, and T is a topo¬ 
logical space. The closed sets in this space are T itself, the empty set and the 
set {a}. Note that the closure of {b} is the whole space T. 
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9.2. Comparison of topologies. Let x x and x a be two topologies defined 
in the same set X. 1 Then we say that the topology x x is stronger than the 
topology x a (or equivalently that x 2 is weaker than 'ey) if x 2 c t i> he., if 
every set of the system x 2 is a set of the system x t . 

Theorem 1. The intersection t = fj x a of any set of topologies in X 
is itself a topology in X. 

Proof. Clearly f) T* contains X and 0 . Moreover, since every x a is 

a 

closed (algebraically) under the operations of taking arbitrary unions and 
finite intersections, the same is true of f) r a . § 

a 

Corollary. Let 83 be any system of subsets of a set X. Then there 
exists a minimal topology in X containing 83, i.e., a topology x{83) con¬ 
taining 83 and contained in every topology containing 83. 

Proof. A topology containing 83 always exists, e.g., the topology 
in which every subset of X is open. The intersection of all topologies 
containing 83 is the desired minimal topology x (83), often called the 
topology generated by the system 83. 1 

Let 83 be a system of subsets of X and A a fixed subset of X. Then by 
the trace of the system 83 on the set A we mean the system 83 A consisting of 
all subsets of X of the form A n B, B e 83. It is easy to see that the trace 
(on A) of a topology t (defined in X) is a topology t a in A. (Such a topol¬ 
ogy is often called a relative topology.) In this sense, every subset A of a 
given topological space ( X , t) generates a new topological space (A, r A ), 
called a subspace of the original topological space (X, x). 

9.3. Bases. Axioms of countability. As we have seen, defining a topology 
in a space T means specifying a system of open sets in T. However, in many 
concrete problems, it is more convenient to specify, instead of all the open 
sets, some system of subsets which uniquely determines all the open sets. 
For example, in the case of a metric space we first introduced the notion of 
an open sphere (e-neighborhood) and then defined an open set G as a set such 
that every point xeG has a neighborhood Ofx) <= G. In other words, the 
open sets in a metric space are precisely those which can be represented as 
finite or infinite unions of open spheres. In particular, the open sets on the 
real line are precisely those which can be represented as finite or countable 
unions of open intervals (recall Theorem 6 , p. 51). These considerations 
suggest 


1 This gives two topological spaces T x = ( X ,, -tr) and r 2 = (X, x 2 ). 


Definition 3. A family 18 of open subsets of a topological space T is 
called a base for T if every open set in T can be represented as a union of 
sets in -8. 

Example 1. The set of all open spheres (of all possible radii and with all 
possible centers) in a metric space R is a base for R. In particular, the set 
of all open intervals is a base on the real line. The set of all open intervals 
with rational end points is also a base on the line, since any open interval 
(and hence any open set on the line) can be represented as a union of such 
intervals. 


It is clear from the foregoing that a topology x can be defined in a set T 
by specifying a base 18 in T. This topology x is just the system of sets which 
can be represented as unions of sets in 18. If this way of specifying a topology 
is to be of practical value, we must find requirements which, when imposed 
on a system 8 of subsets of a given set T, guarantee that the system x of all 
possible unions of sets in 18 be a topology in T, i.e., that x have the two 
properties figuring in Definition 1: 

Theorem 2. Given a set T, let 18 be a system of subsets G x c T with the 
following two properties'. 

1) Every point xeT belongs to at least one G x e18; 

2) If x e G x n Gp, then there is a G x e 18 such that x e G. { G x n Gg. 

Suppose the empty set 0 and all sets representable as unions of sets G x 
are designated as open. Then T is a topological space, and 8 is a base for T. 


Proof. It follows at once from the conditions of the theorem that the 
whole set T and the empty set 0 are open sets, and that the union of any 
number of open sets is open. We must still show that the intersection of 
a finite number of open sets is open. It is enough to prove this for just 
two sets. Thus let 

A = U G a , B = U Go. 

cc 3 

Then 


A n B = U (G« n G p ). 

a,3 


( 1 ) 


By hypothesis, given any point xeG a n G ,,, there is a G., e 18 such that 
x e G Y <= G a n Gg. Hence the set G x n G p is open, being the union of 
all G y contained in G x n Gg. But then (1) is also open. Therefore Tis a 
topological space. The fact that 18 is a base for T is clear from the way 
open sets in T are defined. 1 


The following theorem is a useful tool for deciding whether or not a 
given system of open sets is a base: 
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Theorem 3. A system 9S of open sets Ci a in a topological space T is a 
base for T if and only if, given any open set G <= T and any point x e G, 
there is a set G x e such that x e G a c: G. 

Proof. If S is a base for T, then every open set G <= T is a union 

G=UG„ 

a 

of sets G a e S. Therefore every point x e G is contained in some set 
G a <= G. Conversely, given any open set G <= T, suppose that for every 
point x e G there is a set Gfix) e S such that x e G a (x) G. Then 

G = U Gfx), 

xeO 

i.e., G is a union of sets in IS. | 

Example 2. It follows from Theorem 3 that the set of all open spheres 
with rational radii (and all possible centers) in a metric space R is a base for 
R (this is obvious anyway). In particular, as already noted in Example 1, 
the set of all open intervals with rational end points is a base for the real line. 

An important class of topological spaces consists of spaces with a countable 
base, i.e., spaces in which there is at least one base containing no more than 
countably many sets. Such a space is also said to satisfy the second axiom of 
countability. 

Theorem 4. If a topological space T has a countable base, then T con¬ 
tains a countable everywhere dense subset, i.e., a countable set M <= T 
such that [M] = T. 

Proof. Let S — {G 1; G a , be a countable base for T, 

and choose a point x n in each G n . Then the set 

M = {*!, X 2 , ... ,X n , . . .} 

is countable. Moreover, M is everywhere dense in T, since otherwise 
the nonempty open set G = T — [ M ] would contain no points of M. 
But this is impossible, since G is a union of some of the sets G n in S and 
G„ contains the point x„ e M. g 

For metric spaces, we can say even more: 

Theorem 5. If a metric space R has a countable everywhere dense 
subset, then R has a countable base. 

Proof. Suppose R has a countable everywhere dense subset {x x , 
x 2 ,. .. , x n ,.. •}■ Then, given any open set G <= R and any x e G, there 
is an open sphere S(x m , 1/n) such that x e S(x m , 1 /«) <= G for suitable 


sec. 9 

positive integers m and n (why?). Hence the open spheres S(x m , 1/n), 
where m and n range over all positive integers, form a countable base for 
R. I 

Combining Theorems 4 and 5, we see that a metric space R has a countable 
base if and only if it has a countable everywhere dense subset. 

Example 3. Every separable metric space, i.e., every metric space with a 
countable everywhere dense subset, is a metric space satisfying the second 
axiom of countability. 

Example 4. The space m of all bounded sequences is not separable (recall 
Example 7, p. 48) and hence has no countable base. 

Remark. In general, Theorem 5 does not hold for arbitrary (nonmetric) 
topological spaces. In fact, examples can be given of topological spaces 
which have a countable everywhere dense subset but no countable base. Let us 
see how this might come about. Given any point x of a metric space R, there 
is a countable neighborhood base (or local base) at x, i.e., a countable system 
0 of neighborhoods of x with the following property: Given any open set G 
containing x, there is a neighborhood O e (9 such that O c G (cf. Theorem 
3). 2 Suppose every point x of a topological space T has a countable neigh¬ 
borhood base. Then T is said to satisfy the first axiom of countability. 
However, this axiom need not be satisfied in an arbitrary topological space. 
Hence the argument used in the case of metric spaces to deduce the existence 
of a countable base from that of a countable everywhere dense subset does 
not carry over to the case of an arbitrary topological space. 

A system .J( of sets M a is called a cover (or covering) of a topological 
space T, and is said to cover T, if 

T = U M«. 

a 

A cover consisting of open (or closed) sets only is called an open (or closed) 
cover. If J( is a cover of a topological space T, then by a subcover of 
we mean any subset of which also covers T. 

Theorem 6 . If T is a topological space with a countable base IS, then 
every open cover (9 has a finite or countable subcover. 

Proof. Since (9 covers T, each point x e T belongs to some open set 
0 a e 0. Moreover, since IS is a countable base for T, for each x e T 
there is a set G n (x) e IS such that x e G n (x) <= O a (recall Theorem 3). 


2 For example, the set of open spheres S(x, 1 In) is a countable neighborhood base at 
any point x of a metric space R. 
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The collection of all sets G„(x) selected in this way is finite or countable 
and covers T. For each G n (x) we now choose one of the sets O a containing 
G n (x :), thereby obtaining a finite or countable subcover of 0. g 

Given any topological space T, the empty set 0 and the space T itself 
are both open and closed, by definition. A topological space T is said to 
be connected if it has no subsets other than 0 and T which are both open 
and closed. For example, the real line R 1 is connected, but not the set 
R 1 — {a} obtained from R 1 by deleting any point x. 

9.4. Convergent sequences in a topological space. The concept of a con¬ 
vergent sequence, introduced in Sec. 6.2 for the case of a metric space, 
generalizes in the natural way to the case of a topological space. Thus a 
sequence of points {x r } — x l5 x 2 , ... , x n , ... in a topological space T is 
said to converge to a point x e T (called the limit of the sequence) if every 
neighborhood G(x) of x contains all points x n starting from a certain index . 3 
However, the concept of a convergent sequence does not play the same basic 
role for topological spaces as for metric spaces. In fact, in the case of a 
metric space R, a point x is a contact point of a set M <=■ R if and only if M 
contains a sequence converging to x. On the other hand, in the case of a 
topological space T, this is in general not true, as shown by Problem 11. 
In other words, a point x can be a contact point of a set M <= T (i.e., x can 
belong to [ M ]) without M containing a sequence converging to x. However, 
convergent sequences “are given their rights back” if T satisfies the first 
axiom of countability, i.e., if there is a countable neighborhood base at every 
point x eT: \ 

Theorem 7. If a topological space T satisfies the first axiom of 
countability, then every contact point x of a set M c: T is the limit of a 
convergent sequence of points in M. 

Proof. Let (9 be a countable neighborhood base at x, consisting of 
sets O n . It can be assumed that O n+l cz O n (n = 1,2,...), since other- 

n 

wise we need only replace O n by O k . Let x n be any point of M 

contained in O n . Such a point x„ can always be found, since x is a 
contact point of M. Then the sequence {x„} obviously converges to 
x. 1 

Remark. As already noted, every metric space satisfies the first axiom 
of countability. This, together with Theorem 7, shows why in the case of 
metric spaces we were able to formulate concepts like contact point, limit 


3 More exactly, if, given any G(x), there is an integer N a such that G(x ) contains all 
points x n with « >N g . 


point, etc. in terms of convergent sequences (recall Theorems 2 and 2', 
p. 48). 

9.5. Axioms of separation. Although many basic concepts of the theory 
of metric spaces carry over easily to the case of topological spaces, an 
arbitrary topological space is still too general an object for most problems 
of analysis. In fact, things can happen in an arbitrary topological space 
which differ in an essential way from what happens in a metric space. Thus, 
for example, a finite set of points need not be closed in an arbitrary topo¬ 
logical space, as shown in Example 4, p. 79. Hence it is desirable to 
specialize the notion of a topological space somewhat by considering topo¬ 
logical spaces more closely resembling metric spaces. This is done by 
imposing extra conditions on a topological space T, in addition to the two 
defining properties figuring in Definition 1, p. 78. For example, as we 
have already seen, the axioms of countability allow us to study topological 
spaces from the standpoint of the concept of convergence. We now introduce 
supplementary conditions, called axioms of separation, of quite a different 
type: 

Definition 4. Suppose that for each pair of distinct points x and y in 
a topological space T, there is a neighborhood O x of x and a neighborhood 
O y ofy such that x e O y ,y e O x . Then T is said to satisfy the first axiom of 
separation , and is called a T x -space. 

Example 1. The space in Example 2, p. 79 is a 7j-space, but not the space 
in Example 4. 

Theorem 8. Every finite subset of a T x -space is closed. 

Proof. Given any single-element set {x}, suppose y f x. Then y 
has a neighborhood O y which does not contain x, i.e., y ^ [{x}]. There¬ 
fore [{x}] = {x}, i.e., every “singleton” {x} is closed. But every finite 
union of closed sets is itself closed. Hence every finite subset of the given 
space is closed. 1 

The next axiom of separation is stronger than the first axiom: 

Definition 5. Suppose that for each pair of distinct points x and y in 
a topological space T, there is a neighborhood O x of x and a neighborhood 
O y of y such that O x n O y = 0 . Then T is said to satisfy the second (or 
Hausdorff) axiom of separation, and is called a T 2 -space or Hausdorff 
space. 

Thus, roughly speaking, each pair of disjoint points in a Hausdorff space 
has a pair of disjoint neighborhoods. 
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Example 2. Every Hausdorff space is a 7\-space, but not conversely (see 
Problem 10). 

Topological spaces more general than Hausdorff spaces are rarely used 
in analysis. In fact, most of the topological spaces of interest in analysis 
satisfy a separation condition even stronger than the second axiom of 
separation: 

Definition 6. A Tyspace T is said to be normal if for each pair of 
disjoint closed sets F l and F 2 in T, there is an open set 0 1 containing F L 
and an open set 0 2 containing F 2 such that O x nO,= 0. 

In other words, each pair of disjoint closed sets in a normal space has a 
pair of disjoint “neighborhoods.” 

Example 3. Obviously, every normal space is a Hausdorff space. 

Example 4. Consider the closed unit interval [0, 1], where neighborhoods 
of any point x # 0 are defined in the usual way (i.e., as open sets containing 
x), but neighborhoods of the point x = 0 are all half-open intervals [ 0 , a) 
with the points 


deleted (and arbitrary unions and finite intersections of these neighborhoods 
with neighborhoods of nonzero points). This space is Hausdorff, but not 
normal since the set { 0 } and the set of points ( 2 ) are disjoint closed sets 
without disjoint neighborhoods. 

Theorem 9. Every metric space is normal. 

Proof Let X and Y be any two disjoint closed subsets of R. Every 
point x £ X has a neighborhood O x disjoint from Y, and hence is at a 
positive distance p* from Y (recall Problem 9, p. 54). Similarly, every 
point y e Y is at a positive distance p„ from X. Consider the open sets 

U = U S(x, |p*), V = U S(y, | P „), 

xeX yeT 

where, as usual, S(x, r) is the open sphere with center x and radius r. 

It is clear that X <= {/, Y <= V. Moreover, U and Care disjoint. In fact, 
suppose to the contrary that there is a point zeU HE. Then there are 
points i 0 el,; 0 e Y such that 

p( x o, z ) ’ 2 P* 0 > p( z >To) 2 ?v (; - 

Assume, to be explicit, that p* < p v Then 

pOo,Jo) < pO„> z) + p(z, y 0 ) < ip* o + !p„ o < p v 


i.e., x 0 e S(y 0 , p„ ). This contradicts the definition of p # , and shows that 
there is no point z e U n V. | 

Remark. Every subspace of a metric space is itself a metric space and 
hence normal. This is not true for normal spaces in general, i.e., a subspace 
of a normal space need not be normal . 4 A property of a topological space 
T shared by every subspace of T is said to be hereditary. Thus normality of a 
space is not a hereditary property. These ideas are pursued in Problems 
13 and 14. 

9.6. Continuous mappings. Homeomorphisms. The concept of a contin¬ 
uous mapping, introduced for metric spaces in Sec. 5.2, generalizes at once 
to the case of arbitrary topological spaces. Thus, let /be a mapping of one 
topological space X into another topological space T, so that / associates 
an element y = f(x)e Y with each element x e X. Then / is said to be 
continuous at the point x 0 e X if, given any neighborhood V y of the point 
To = fix 0 ), there is a neighborhood U x of the point x 0 such that f(U x ) <= 
V„ o . The mapping/is said to be continuous on X if it is continuous at every 
point of X. In particular, a continuous mapping of a topological space X 
into the real line is called a continuous real function on X. 

Remark. These definitions clearly reduce to the corresponding definitions 
for metric spaces in Sec. 5.2 if X and Y are both metric spaces. 

The notion of continuity of a mapping/of one topological space into 
another 5 is easily stated in terms of open sets, i.e., in terms of the topologies 
of the two spaces: 

Theorem 10. A mapping f of a topological space X into a topological 
space Y is continuous if and only if the preimage T = f~ 1 (G) of every 
open set G <= Y is open (in X). 

Proof. Suppose / is continuous on X, and let G be any open subset 
of Y. Choose any point x e T = / _1 (G), and let y = /(x). Then G is a 
neighborhood of the point y. Hence, by the continuity of/, there is a 
neighborhood (7* ofx such that/(I/*) <= G, i.e., U x <= T. In other words, 
every point x e T has a neighborhood contained in F. But then F is 
open (see Problem 1). 

Conversely, suppose F = /' /G') is open whenever G <=■ Y is open. 
Given any point x e X, let V v be any neighborhood of the point y = /(x). 


4 See e.g., J. L. Kelley, General Topology. D. Van Nostrand Co., Inc., Princeton, N.J. 
(1955), p. 132. 

5 If desired, the mapping/can always be regarded as “onto,” since otherwise we need 
only replace the space Y by the subspace f(X) <= Y. 
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Then clearly xe/^l,), and moreover f~\V y ) is open, by hypothesis. 
Therefore U x = f~ 1 (V v ) is a neighborhood of x such that f(U x ) <= V„. 

In other words,/is continuous at x and hence on X, since .* is an arbitrary 
point of X. | 

Naturally, Theorem 10 has the following “dual”: 

Theorem 10'. A mapping f of a topological space X into a topological 
space Y is continuous if and only if the preimage F = f~ l {F) of every closed 
set F <= Y is closed (in X). 

Proof. Use the fact that the preimage of a complement is the comple¬ 
ment of the preimage. | 

Remark. Let X and Y be two arbitrary sets, and let / be a mapping of 
X into Y. Suppose that in Y there is specified a topology t, i.e., a system 
of sets containing Y and 0 , and closed under the operations of taking 
arbitrary unions and finite intersections. Then since the preimage of a 
union (or intersection) of sets equals the union (or intersection) of the 
preimages of the sets, by Theorems 1 and 2, p. 5, the preimage of the 
topology t, i.e., the system of all sets f~ l {G) where G e t, is a topology 
in X which we denote by / _1 (t). 

Suppose now that X and Y are topological spaces, with topologies t x 
and T r , respectively. Then Theorem 10, giving a necessary and sufficient 
condition for a mapping/of X into Y to be continuous can be paraphrased 
as follows: A mapping/of X into Y is continuous if and only if the topology 
x x is stronger than the topology/ - 1 (t f ). 

Example. It is easy to see that the image (as opposed to the preimage) of 
an open set under a continuous mapping need not be open. Similarly, the 
image of a closed set under a continuous mapping need not be closed. For 
example, consider the mapping of the half-open interval X = [0, 1) onto the 
circle of unit circumference corresponding to “winding” the interval onto 
the circle. Then the set [£, 1), which is closed in [0, 1), goes into a set which 
is not closed on the circle (see Figure 12). 


f( 0) 



The theorem on continuity of composite functions, familiar from 
elementary calculus, has the following analogue for topological spaces: 

Theorem 11. Given topological spaces X, Y and Z, suppose f is a 
continuous mapping of X into Y and cp a continuous mapping of Y into Z. 
Then the mapping cp/, i.e., the mapping carrying x into cp {fix)), is 
continuous. 

Proof. An immediate consequence of Theorem 10. | 

Given two topological spaces X and Y, let/be a one-to-one mapping of X 
onto Y, and suppose / and f~ l are both continuous. Then / is called a 
homeomorphic mapping or simply a homeomorphism (between X and Y). 
Two spaces X and Y are said to be homeomorphic if there exists a homeo¬ 
morphism between them. Homeomorphic spaces have the same topological 
properties, and from the topological point of view are merely two “repre¬ 
sentatives” of one and the same space. In fact, if X and Y have topologies 
t x and Ty, respectively, and if/is a homeomorphic mapping of X onto Y, 
then t x = /-!(t f ) and t f = /( x x ). The relation of being homeomorphic 
is obviously reflexive, symmetric and transitive, and hence is an equivalence 
relation. Therefore any given family of topological spaces can be partitioned 
into disjoint classes of homeomorphic spaces. 

Remark. Again these are the natural generalizations of the same notions 
for metric spaces, introduced in Sec. 2.2. It should be noted that two homeo¬ 
morphic metric spaces need not have the same “metric properties” (recall 
Problem 9, p. 66 ). Note also that the topology of a metric space is uniquely 
determined by its metric, but not conversely (illustrate this by an example). 

9.7. Various ways of specifying topologies. Metrizability. The most direct 
and in principle the simplest way of specifying a topology in a space T is to 
indicate which subsets of T are regarded as open. The system of all such 
subsets must then satisfy properties 1) and 2) of Definition 1. By duality, 
we could just as well indicate which subsets of X are regarded as closed. 
The system of all such subsets must then satisfy properties 1') and 2') on 
p. 79. However, this method is of limited practical value. For example, in 
the case of the plane it is hardly possible to give a direct description of all 
open sets (as was done in Theorem 6 , p. 51 for the case of the line). 

A topology is often specified in a space T by giving a base for T. In 
fact, this is precisely what is done in Sec. 6 for the case of a metric space R, 
where the base for R consists of all open spheres (or even all open spheres 
with rational radii). 

Another way of specifying a topology in a space T is to introduce the 
notion of convergence in T. As noted in Sec. 9.4, this is not a universal 


Figure 12 
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method. It does work, however, in the case of spaces satisfying the first 
axiom of countability . 6 7 

Still another way of introducing a topology in a space T is to specify 
a closure operator in T, i.e., a mapping which assigns to each subset M <= T 
a subset [ M] <= T and satisfies the four properties listed in Theorem 1, 
p. 46. It can be shown that the system of complements of all sets M <= T 
such that [M\ — M is then a topology in TA 

Specifying a metric in a space T is one of the most important ways of 
introducing a topology in T, but it is again far from being a universal method. 
As already noted, every metric space is normal and satisfies the first axiom 
of countability. Hence no metric can be used to introduce a topology in a 
space which fails to have these two properties. A topological space T is said 
to be metrizable if its topology can be specified by means of some metric 
(more exactly, if it is homeomorphic to some metric space). As just pointed 
out, a necessary condition for a topological space T to be metrizable is that 
it be normal and satisfy the first axiom of countability. However, it can be 
shown that these conditions are not sufficient for T to be metrizable. On the 
other hand, in the case of a space with a countable base (i.e., satisfying the 
second axiom of countability), we have 

Urysohn’s metrization theorem. A necessary and sufficient condi¬ 
tion for a topological space with a countable base to be metrizable is that 
it be normal. 

The necessity follows from Theorem 9. For the sufficiency we refer to the 
literature . 8 

Problem 1. Given a topological space T, prove that a set G <=■ T is open if 
and only if every point xeG has a neighborhood contained in G. 

Problem 2. Given a topological space T, prove that 

a) [M] = M if and only if M is a closed set, i.e., the complement T — G 
of an open set G c T ; 

b) [ M] is the smallest closed set containing M; 

c) The closure operator, i.e., the mapping of T into T carrying M into 
[M] satisfies Theorem 1, p. 46. 

Problem 3. Consider the set 3~ of all possible topologies defined in a 
set X, where t 2 < -r, means that t 2 is weaker than ~ 1 . Verify that < is a 


6 In fact, by suitably generalizing the notion of convergence (and introducing the 
concepts of “nets” and “filters”), this method can be made to work quite generally. See 
e.g., J. L. Kelley, op. cit., p. 83. 

7 J. L. Kelley, op. cit., p. 43. 

8 See e.g., P. S. Alexandroff, Einfiihrung in die Mengenlehre und die Theorieder Reellen 
Funktionen, VEB Deutscher Verlag der Wissenschaften, Berlin (1956), p. 195 ff. 
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partial ordering of ST. Does .T have maximal and minimal elements? If so, 
what are they? 

Problem 4. Can two distinct topologies t x and t 2 in X generate the same 
relative topology in a subset A <= XI 

Problem 5. Let 

X={a,b,c}, A = {a, b}, B = {b,c}, 
and let ^ = {0, X, A, B}. Is ^ a base for a topology in XI 

Problem 6. Prove that if M is an uncountable subset of a topological 
space with a countable base, then some point of M is a limit point of M. 

Problem 7. Prove that the topological space T in Example 4, p. 79 is 
connected. 

Comment. T might be called a “connected doubleton.” 

Problem 8. Prove that a topological space satisfying the second axiom of 
countability automatically satisfies the first axiom of countability. 

Problem 9. Give an example of a topological space satisfying the first 
axiom of countability but not the second axiom of countability. 

Problem 10. Let t be the system of sets consisting of the empty set and 
every subset of the closed unit interval [0, 1] obtained by deleting a finite 
or countable number of points from X. Verify that T = (X, t) is a topological 
space. Prove that T satisfies neither the second nor the first axiom of count- 
ability. Prove that T is a Tj-space, but not a Hausdorff space. 

Problem 11. Let T be the topological space of the preceding problem. 
Prove that the only convergent sequences in T are the “stationary sequences,” 
i.e., the sequences all of whose terms are the same starting from some index 
n. Prove that the set M — (0, 1] has the point 0 as a contact point, but 
contains no sequence of points converging to 0. 

Problem 12. Prove the converse of Theorem 8. 

Comment. Hence a topological space T is a 7\-space if and only if every 
finite subset of T is closed. 

Problem 13. Prove the following theorem, known as Urysohn’s lemma : 
Given a normal space T and two disjoint closed subsets F x , F 2 e T, there 
exists a continuous real function/ such that 0 < m < 1 and 

0 if x e F x , 

1 if x e F 2 . 
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Problem 14. A "TVspace T is said to be completely regular if, given any 

closed set fc T and any point x 0 e T . F, there exists a continuous real 

function/ such that 0 < m < 1 and 



if x — x„, 
if x g F. 


(Completely regular spaces are also called Tychonoff spaces.) Prove that 
every normal space is completely regular, but not conversely. Prove that 
every subspace of a completely regular space (in particular, of a normal space) 
is completely regular. 

Comment. Thus, unlike normality, complete regularity is a hereditary 
property. It can be shown that a space is completely regular if and only if 
it is a subspace of a normal space. 9 Completely regular spaces are particularly 
important in analysis, since they “are able to support sufficiently many 
continuous functions,” i.e., for any two distinct points x and y of a completely 
regular space T, there is a continuous real function on T taking distinct 
values at x and y. 


SO. Compactness 

10.1. Compact topological spaces. The reader has presumably already 
encountered the familiar 

Heine-borel theorem. Any cover of a closed interval [a, b] by a system 
of open intervals (or, more generally, open sets) has a finite subcover. 

Generalizing this property of closed intervals, we are led to a key concept 
of real analysis: 

Definition 1. A topological space T is said to be compact if every open 
cover of T has a finite subcover. A compact Hausdorff space is called a 
compactum. 

Example. As we will see in Sec. 11.2, any closed bounded subset of 
Euclidean n-space R n is compact, for arbitrary n. On the other hand, R n 
itself (e.g., the real line or three-dimensional space) is not compact. 

Definition 2. A system of subsets {A a } of a set T is said to be centered 

n 

if every finite intersection fj A k is nonempty. 10 

i 

9 J. L. Kelley, op. cit., p. 145. 

10 A system of sets with typical member A a will often be denoted by {.4J (this is still 
another use of curly brackets). 


Theorem 1. A topological space T is compact if and only if it has the 
following property : 

A) Every centered system of closed subsets of T has a nonempty 
intersection. 

Proof. Suppose T is compact, and let {FJ be any centered system of 
closed subsets of T. Then the sets G x = T — F a are open. Hence the fact 

n 

that no finite intersection f) F k is empty implies that no finite system of 

&=i 

sets G k = T — F k covers T. But then the whole system of sets {G'J cannot 
cover T, by the compactness, and hence D F a 0 ■ In other words, 
T has property A) if T is compact. * 

Conversely, suppose T has property A), and let {GJ be any open 
cover of T. Setting F a — T — GJ, we find that f) F a = 0, which, by 

a 

property A), implies that the system F a is not centered, i.e., that there 

n 

are sets F lt ... ,F n such that f) Fj : — 0 • But then the corresponding 

open sets G k = T — F k form a finite subcover of the cover {GJ. In 
other words, T is compact if T has property A). 1 

Theorem 2. Every closed subset F of a compact topological space T is 
itself compact. 

Proof. Let {FJ be any centered system of closed subsets of the sub¬ 
space F <= T. Then every F a is closed in T as well, i.e., {FJ is a centered 
system of closed subsets of T. Therefore |"| F a 0, by Theorem 1. 

a 

But then Fis compact, by Theorem 1 again. | 

Corollary. Every closed subset of a compactum is itself a compactum. 

Proof. Use Theorem 2 and the fact that every subset of a Hausdorff 
space is itself a Hausdorff space. | 

Theorem 3. Let K be a compactum and T any Flausdorjf space con¬ 
taining K. Then K is closed in T. 

Proof. Suppose y $ K, so that y e T — K. Then, given any point 
x e K, there is a neighborhood U x of x and a neighborhood V x of y such 
that 

U x nV x =0. 

The neighborhoods {UJ(x e K) form an open cover of K. Hence, by the 
compactness of K, {U x } has a finite subcover consisting of sets U x , . .. , 
U x . Let 

V = v n • • • n v 
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Then V is a neighborhood of the point y which does not intersect the set 
U x U • • • U U x — K, and hence y f [K\. It follows that K is closed 
(in T). I " 

Remark. It is a consequence of Theorems 2 and 3 that compactness is 
an “intrinsic property,” in the sense that a compactum remains a compactum 
after being “embedded” in any larger Hausdorff space. 

Theorem 4. Every compactum K is a normal space. 

Proof. Let X and Y be any two disjoint closed subsets of K. Re¬ 
peating the argument given in the proof of Theorem 3, we easily see that, 
given any pointy 6 Y, there exists a neighborhood U y containing y and 
an open set O y = X such that U y nO,= 0 . Since Y is compact, by 
Theorem 2, the cover {U y }(y e Y) of the set Y has a finite subcover 
U Vn . The open sets 

0(1)”= o yi n • • • n o Vn , = U yi u---uU Vn 

then satisfy the normality conditions 

0 (1 > => X, 0< 2) =5 Y, O a> n 0 (2 > = 0. 1 

10.2. Continuous mappings of compact spaces. Next we show that the 
“continuous image” of a compact space is itself a compact space: 

Theorem 5. Let Xbea compact space andf a continuous mapping of X 
onto a topological space Y. Then Y = f(X) is itself compact. 

Proof. Let {V a } be any open cover of Y, and let U a = f~ x {Vf). Then 
the sets U a are open (being preimages of open sets under a continuous 
mapping) and cover the space X. Since X is compact, {U a } has a finite 
subcover U x y... , U x% . Then the sets V Xi ,. .. , V Xn , where V k = f(U k ), 
cover Y. It follows that Y is compact. 1 

Theorem 6. A one-to-one continuous mapping of a compactum X 
onto a compactum Y is necessarily a homeomorphism. 

Proof. We must show that the inverse mapping/- 1 is itself continuous. 
Let F be a closed set in X and P — f ( F ) its image in Y. Then P is a 
compactum, by Theorem 5. Hence, by Theorem 3, P is closed in Y. 
Therefore the preimage under f~ x of any closed set F <= X is closed. It 
follows from Theorem 10', p. 88 that/ _1 is continuous. 1 

10.3. Countable compactness. We begin by proving an important property 
of compact spaces: 

Theorem 1. If T is a compact space, then any infinite subset ofT has 
at least one limit point. 


Proof. Suppose T contains an infinite set X with no limit point. Then 
T contains a countable set 


X {xi, x%, . , x n , . ..} 


with no limit point. But then the sets 

x n = [x n , x n+1 ,...} (n = 1,2,...) 

form a centered system of closed sets in T with an empty intersection, 
i.e., T is not compact, g 

These considerations suggest 

Definition 3. A topological space T is said to be countably compact 
if every infinite subset of T has at least one limit point (in T). 

Thus Theorem 7 says that every compact set is countably compact. The 
converse, however, is not true (see Problem 1). The relation between the 
concepts of compactness and countable compactness is made clear by 

Theorem 8. Each of the following two conditions is necessary and 
sufficient for a topological space T to be countably compact: 

1) Every countable open cover of T has a finite subcover; 

2) Every countable centered system of closed subsets of T has a non¬ 
empty intersection. 


Proof. The equivalence of conditions 1) and 2) is an immediate 
consequence of the duality principle. Moreover, if T is not countably 
compact, then, repeating the argument given in proving Theorem 7, 
we find that there is a countable centered system of closed subsets of T 
with an empty intersection. This proves the sufficiency of condition 2). 
Thus we need only prove the necessity of condition 2). Let T be 
countably compact, and let {F n } be a countable centered system of 
closed sets in T. Then, as we now show, fj F n # 0. Let 


o.„ = n F k . 

i 

Then none of the <f>„ is empty, since {F„} is centered. Moreover, 


<J>, => O „ • • • 3 <J> => 


and 


n <j>„ = n F n . 


There are now just two possibilities: 

1) = ®« 0+ i = • • • starting from some index in which case it 

is obvious that f) = $»„ ¥= 0 ■ 

n 
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2) There are infinitely many distinct sets In this case, there is 
clearly no loss of generality in assuming that all the <t> n are distinct. 
Let x n 6 <t>„ — Then the sequence {x„} consists of infinitely 
many distinct points of T, and hence, by the countable compact¬ 
ness of T, must have at least one limit point, say x 0 . But then x Q 
must be a limit point of d>„, since contains all the points x„, 
x„ +1 .Moreover x 0 6 <!>„, since <I*„ is closed. It follows that 

x 0 e fl 4>„, i.e., n<J>„#0. I 

n n 

Thus compact topological spaces are those in which an arbitrary open 
cover has a finite subcover, while countably compact spaces are those in 
which every countable open cover has a finite subcover. Although in general 
countable compactness does not imply compactness, we have the following 
important special situation: 

Theorem 9. The concepts of compactness and countable compactness 
coincide for a topological space T with a countable base. 

Proof By Theorem 6, p. 83, every open cover 6 of T has a countable 
subcover. Hence, if T is countably compact, (9 has a finite subcover, by 
Theorem 8. | 

Remark. The concept of a countably compact topological space, unlike 
that of a compact space, has not turned out to be very natural or fruitful. 
Its presence in mathematics can be explained in terms of a kind of “historical 
inertia.” The point is that, as will be shown in the next section, the concepts 
of compactness and countable compactness coincide for metric spaces, as 
well as for spaces with a countable base. The notion of compactness was 
originally introduced in connection^with metric spaces, with a compact metric 
space being defined as one in which every infinite subset has at least one 
limit point (i.e., in terms of what is now called “countable compactness”). 
The “automatic transcription” of this definition from metric spaces to 
topological spaces then led to the concept of a countably compact topological 
space. Sometimes, especially in the older literature, the word “compact” 
is used in the sense of “countably compact,” and a topological space compact 
in our sense (i.e., such that every open cover has a finite subcover) is said 
to be “bicompact.” In this older language, a compact HausdorfF space 
(a “compactum” in our terminology) is called a “bicompactum,” and the 
term “compactum” is reserved for a compact metric space. We will adhere 
to the terminology introduced in Definitions 1 and 3, often using the term 
“metric compactum” to designate a compact metric space. 

10.4. Relatively compact subsets. Among the subsets of a topological 
space, those whose closures are compact are of special interest: 
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Definition 4. A subset M of a topological space T is said to be rela¬ 
tively compact {in T) if its closure M in T is compact. 

Example 1. According to Theorem 2, every subset of a compact topo¬ 
logical space is relatively compact. 

Example 2. As we will see in Sec. 11.3, every bounded subset of the real 
line R 1 (or more generally of Euclidean n-space R n ) is relatively compact. 

A related concept is given by 

Definition 5. A subset M of a topological space T is said to be rela¬ 
tively countably compact {in T ) if every infinite subset A <= M has at least 
one limit point in T {which may or may not belong to M). 

Relative compactness (unlike compactness) is not an “intrinsic property,” 
i.e., it depends on the space T in which the given set Af is “embedded.” 
For example, the set of all rational numbers in the interval (0, 1) is relatively 
compact if regarded as a subset of the real line, but not if regarded as a subset 
of the space of all rational numbers. The concept of relative compactness 
is most important in the case of metric spaces (see Sec. 11.3). 

Problem 1. Let X be the set of all ordinal numbers less than the first 
uncountable ordinal. Let (a, (3) <= X denote the set of all ordinal numbers 
Y such that a < y < (3, and let the open sets in X be all unions of intervals 
(a, (3). Prove that the resulting topological space is countably compact but 
not compact. 

Problem 2. A topological space T is said to be locally compact if every 
point x £ T has at least one relatively compact neighborhood. Show that a 
compact space is automatically locally compact, but not conversely. Prove 
that every closed subspace of a locally compact subspace is locally compact. 

Problem 3. A point x is said to be a complete limit point of a subset A of a 
topological space if, given any neighborhood U of x, the sets A and A n U 
have the same power (i.e., cardinal number). Prove that every infinite subset 
of a compact topological space has at least one complete limit point. 

Comment. Conversely, it can be shown that if every infinite subset of a 
topological space Thas at least one complete limit point, then T is compact. 11 

H. Compactness in Metric Spaces 

11.1. Total boundedness. Since metric spaces are topological spaces of a 
special kind, the definitions and results of the preceding section apply to 

11 P. S. Alexandroff, op. cit., pp. 250-251; J. L. Kelley, op. cit., pp. 163-164. 
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metric spaces as well. However, in the case of metric spaces, the concept 
of compactness is intimately connected with another concept, known as 
total boundedness. ufi' 

Definition 1. Let Rbe a metric space and e any positive number. Then 

a set A <= R is said to be an e-net for a set M <= R if, for every x e M , 

there is at least one point a e A such that p(x, a) < e. 

Example 1. The set of all points with integral coordinates is a (1/V2)-net. 

Example 2. Every subset of a totally bounded set is itself totally bounded. 

Definition 2. Given a metric space R and a subset M <= R t suppose M 

has a finite e-net for every e > 0. Then M is said to be totally bounded. 

If a set M is totally hounded, then obviously so is its closure [M]. Every 
totally bounded set is automatically bounded, being the union of a finite 
number of bounded sets (recall Problem 5, p. 65). The converse is not true, 
as shown in Example 4. 

Example 3. In Euclidean /i-space R n , total boundedness is equivalent to 
boundedness. In fact, if M c R is bounded, then M is contained in some 
sufficiently large cube Q. Partitioning Q into smaller cubes of side e, we find 
that the vertices of the little cubes form a finite (V«s/2)-net for Q and hence 
(a fortiori) for any set contained in Q. 

Example 4. The unit sphere E in / 2 , with equation 

lxl= 1, 

71 =1 

is bounded but not totally bounded. In fact, consider the points 
^ = (1,0,0,...), e 2 = (0,1, 0,...),..., 
where the nth coordinate of e n is one and the others are all zero. These 
points all lie on E, and the distance between any two of them is V2. Hence 
E cannot have a finite e-net with s < >/2/2. 

Example 5. Let n be the set of points x — (x u x 2 , ■ ■ ., x n ,. . .) in / 2 
satisfying the inequalities 

l*il < 1. 1**1 \x„\ < • 

The set II, called the Hilbert cube (or fundamental parallelepiped ) 12 furnishes 

12 Another commonly encountered definition of the Hilbert cube is the set of points 
in 1 2 satisfying the inequalities 

kx| < 1, |x 2 | < i . . . , \x„\ <-,... 
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an example of an infinite-dimensional totally bounded set. The fact that II 
is totally bounded can be seen as follows: Given any e > 0, choose n such 
that 


and with each point 


- L I< £ ' 
2 n_1 2 


X = (Xj, x 2 ,. . . , x„,. . .) 


in n associate the point 

x* = (x 1; x 2 .x„, 0, 0, ...) 

(x* is also a point in II). Then 

V jfc=.jt+l V k=n 4 z z 


But the set II* of all points in II of the form (1) is totally bounded, being 
a bounded set in n-space. Let A be a finite (e/2)-net in II *. Then A is a finite 
s-net for the whole set II. 


11.2. Compactness and total boundedness. We now show the connection 
between the concepts of compactness (of both kinds) and total boundedness: 

Theorem 1. Every countably compact metric space R is totally bounded. 

Proof. Suppose R is not totally bounded. Then there is an e 0 > 0 
such that R has no finite e 0 -net. Choose any point a x e R. Then R 
contains at least one point, say a 2 , such that 

p(«i, « 2 ) > s 0 > 

since otherwise «i would be an e 0 -net for R. Moreover, R contains a 
point a 3 such that 

P(tfi, «s) £ 0' p [a2. Ug) i ’ 8q, 

since otherwise the pair a x , a 2 would be an e 0 -net for R. More generally, 
once having found the points a x , a 2 ,. .. , a n , we choose a n+x e R such 
that 

P(«*> <*«+!.) > £ o (k = 1,2,..., n). 

This construction gives an infinite sequence of distinct points a x , a 2 ,... , 
«„,... with no limit points, since p (a } , a k ) > s 0 if j k. But then R 
cannot be countably compact. 1 

Corollary 1. Every countably compact metric space has a countable 
everywhere dense subset and a countable base. 

Proof. Since R is totally bounded, by Theorem 1, R has a finite (1 In)- net 
for every n= 1,2,... . The union of all these nets is then a countable 
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everywhere dense subset of R. It follows from Theorem 5, p. 82 that R 
has a countable base. 1 

Corollary 2. Every countably compact metric space is compact. 

Proof. An immediate consequence of Corollary 1 and Theorem 9, 
p. 96. § 

According to Theorem 1, total boundedness is a necessary condition for 
a metric space to be compact. However, this condition is not sufficient. For 
example, the set of rational points in the interval [0, 1] with the ordinary 
definition of distance forms a metric space R which is totally bounded but 
not compact. In fact, the sequence of points 

0, 0.4, 0.41, 0.414, 0.4142,... 

in R, i.e., the sequence of decimal approximations to the irrational number 
V2—1, has no limit point in R. Necessary and sufficient conditions for 
compactness of a metric space are given by 

Theorem 2. A metric space R is compact if and only if it is totally 
bounded and complete. 

Proof. To see that compactness of R implies completeness of R, 
we need only note that if R has a Cauchy sequence { x n } with no limit, 
then {x n } has no limit points in R. This, together with Theorem 1, 
shows that R is totally bounded and complete if R is compact. 

Conversely, suppose R is totally bounded and complete, and let {x n } 
be any infinite sequence of distinct points in R. Let N\ be a finite 1-net 
for R, and construct a closed sphere of radius 1 about every point of N l . 
Since these spheres cover R and there are infinitely many of them, at least 
one of the spheres, say S x , contains an infinite subsequence 
Y (i) v u> 

A'l , . . . , . 

of the sequence {x n }. Let N 2 be a finite |-net for R, and construct a closed 

sphere of radius J for every point of N 2 . Then at least one of these 

spheres, say ,S' 2 , contains an infinite subsequence 

v (2) v (2) 

A i 

of the sequence {xf}, Continue this construction indefinitely, finding 
a closed sphere S 3 of radius \ containing an infinite subsequence 

v (3) v (3) 

> * * • J 5 ' ' • 

of the sequence {x^}, and so on, where S„ has radius 1/2"" 1 . Let S' n be 
the closed sphere with the same center as S n but with a radius r n twice as 
large (i.e., equal to 1/2"). Then clearly 

si =• si => • • • = s; => • • •, 
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and moreover r n -> 0 as n —> oo. Since R is complete, it follows from 
the nested sphere theorem (Theorem 2, p. 60) that 

OO 

ns;^ 0. 

n=l 

In fact, there is a point x 0 e R such that 

OO 

ns; = {x 0 } 

71 = 1 

(recall Problem 3, p. 65). Clearly x 0 is a limit point of the original 
sequence {*„}, since every neighborhood of x 0 contains some sphere S k 
and hence some infinite subsequence { x^}. Therefore every infinite 
sequence {*„} of distinct points of R has a limit point in R. It follows that 
R is countably compact and hence compact, by Corollary 2. g 

Example. As already noted, a subset M of Euclidean n-space R n is totally 
bounded if and only if it is bounded. Moreover, M is complete if and only if 
it is closed (recall Problem 7, p. 66). Hence, by Theorem 2, the set of all 
compact subsets of R n coincides with the set of all closed bounded subsets 
of R n . 

11.3. Relatively compact subsets of a metric space. The concept of relative 
compactness, introduced in Sec. 10.4 for subsets of an arbitrary topological 
space, applies in particular to subsets of a metric space. In the case of a 
metric space, however, there is no longer any distinction between relative 
compactness and relative countable compactness. 

Theorem 3. A subset M of a complete metric space R is relatively 
compact if and only if it is totally bounded. 

Proof. An immediate consequence of Theorem 2 and the fact that a 
closed subset of a complete metric space is itself complete. 1 

Example. Any bounded subset of Euclidean «-space it totally bounded 
and hence relatively compact (this is our version of the familiar Bolzano- 
Weierstrass theorem). 

Remark. The utility of Theorem 3 stems from the fact it is usually easier 
to prove that a set is totally bounded than to give a direct proof of its relative 
compactness. On the other hand, compactness is the key property as far as 
applications are concerned. 

11.4. Arzela’s theorem. The problem of proving the compactness of 
various subsets of a given metric space is encountered quite frequently in 
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analysis. However, the direct application of Theorem 2 is not always easy. 
This explains the need for special criteria serving as practical tools for proving 
compactness in particular,spaces. For example, as we have seen, the bounded¬ 
ness of a set in Euclidean n-space implies its compactness, hut this implication 
fails in more general metric spaces. 

One of the most important metric spaces in analysis is the function space 
C [ a,by introduced in Example 6 , p. 39. For subsets of this space, we have 
an important and frequently used criterion for relative compactness, called 
Arzela’s theorem, which will be stated and proved after first introducing two 
new concepts: 


Definition 3. A family <D of functions 9 defined on a closed interval 
[a, b\ is said to be uniformly bounded if there exists a number K > 0 such 
that 

190)1 < K 

for all x e [a, b] and all 9 e O. 

Definition 4. A family <t> of functions 9 defined on a closed interval 
[a, b] is said to be equicontinuous if, given any s > 0, there exists a number 
8 > 0 such that O' — x"\ < 8 implies 

00') - ?0")l < £ 

for all x', x" £ [a, b] and all 9 e ®. 

Theorem 4 (Arzela). A necessary and sufficient condition for a family 
® of continuous functions 9 defined on a closed interval [a, b ] to be 
relatively compact in C [a 6] is that ® be uniformly bounded and equi¬ 
continuous. 

Proof. We give the proof in two steps: 

Step 1 ( Necessity ). Suppose <5 is relatively compact in C la S] . Then 
by Theorem 3, given any e > 0, there is a finite (s/3)-net 9^’ .. . , <p n 
in <J> (see Problem 1). Being a continuous function defined on a closed 
interval, each 9* is bounded: 


Let 


OiO)l < K { (a<x<b). 
K = max {K x , . . ., K n ) + |. 


By the definition of an (e/3)-net, given any 96 ®, there is at least one cp { 
such that. 


P(<P> ?») = max I9O) - 9iO)l < 

a^x^b 


£ 

3 ' 


Therefore 

I9WI < l<fc(*)l + l<K i + ~ 3 <K, 

i.e., ® is uniformly bounded. Moreover, each function 9 , in the (e/3)-net 
is continuous, and hence uniformly continuous, on [a, b]. Hence, given 
any e > 0 , there is a 8 ; such that 


l<Pi(*i) - 9i(^a)l < ^ 

whenever |x x — x 2 | < 8j. Let 

8 = min {8 X ,... , 8„}. 

Then, given any 96 $ and choosing 9 * such that 0 ( 9 , 9 J < s/3, we have 
It(*i) - <P( X 2)I 

< | 9 (x x ) - 9 <(xi)l + l<p<Oi) - 9i(*a)l + l9i(^2) - 9(^)1 


whenever |x x — x 2 | < 8 . This proves the equicontinuity of ®. 

Step 2 ( Sufficiency ). Suppose ® is uniformly bounded and equi¬ 
continuous. According to Theorem 3, to prove that ® is relatively com¬ 
pact in C [a 63 , we need only show that ® is totally bounded, i.e., that 
given any s > 0 , there exists a finite e-net for ® in Suppose 

19 0)1 < K for all 9 e 0, and let 8 > 0 be such that 


1900 — 90a)l< ^ 

for all 9 e ® whenever \x x — x 2 | < 8 . Divide the interval a < x < b 
along the x-axis into subintervals of length less than 8 , by introducing 
points of subdivision x 0 , x 1; x 2 , . . . , x n such that 

a = x 0 < x x < x 2 < • • • < x n = b, 

and then draw a vertical line through each of these points. Similarly, 
divide the interval —K < y < K along the j-axis into subintervals of 
length less than e/5, by introducing points of subdivision y 0 , , _y 2 ,. . . ,y„ 
such that 

-K = y 0 < jfi < y 2 < ■ ■ ■ < y v = K, 

and then draw a horizontal line through each of these points. In this 
way, the rectangle a < x < b, —M < y < M is divided into np cells of 
horizontal side length less than 8 and vertical side length less than e/5. 
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We now associate with each function 9 e ® a polygonal line 7 = ^(jc) 
which has vertices at points of the form {x k ,y^ and differs from the 
function 9 by less than s/5 at every point x k (the reader should draw a 
figure and convince himself on the existence of such a function). Since 

I<P(**) - |, 

l?(**+i) - 'K**+i)l < ^ . 

I?(**) - ?(^+i)l < y 

by construction, we have 

l<K**) - <K**+i)l < y • 

Moreover, 

3e 

!+(**)- < K*)l < y (x* < x < x* +1 ), 

since tji(x) is linear between the points x k and x k+1 . Let x be any point 
in [a, b] and x k the point of subdivision nearest to x on the left. Then 

l 9 (x) - <Kx)| < l<p(x) - + I <?(•**) - <K**)I + l+O*) - + 0)1 < s, 

i.e., the set of polygonal lines ^O) forms an s-net for <t>. But there 
are obviously only finitely many such lines. Therefore <t> is totally 
bounded. | 

11.5. Peano’s theorem. Arzela’s theorem has many applications, among 
them the following existence theorem for differential equations: 

Theorem 5 ( Peano ). Letf(x,y) be defined and continuous on a plane 
domain G. Then at least one integral curve of the differential equation 

J- = /0, y) (2) 

dx 

passes through each point (x 0 , y 0 ) of G. 

Proof. By the continuity of/, we have 
I/O>j0l < K 

in some domain G' <=■ G containing the point ( x 0 ,y 0 ). Draw the lines 
with slopes K and — K through the point (x 0 , j 0 ). Then draw vertical 
lines x = a and x = b (a < x 0 < b) which together with the first two 
lines form two isosceles triangles contained in G' with common vertex 



(x«, y 0 ), as shown in Figure 13. This gives a closed interval [a, b ], which 
will figure in the rest of the proof. 

The next step is to construct a family of polygonal lines, called Euler 
lines, associated with the differential equation (2). We begin by drawing 
the line with slope/(x 0 , y 0 ) through the point (x 0 , /„). Next, choosing a 
point (x 1 ,y 1 ) on the first line, we draw the line with slope f (x 1; yj through 
the point (x x ,yq). Then, choosing a point (x 2 , y 2 ) on the second line, we 
draw the line with slope/(x 2 , y 2 ) through the point (x,, y 2 ), and so on 

indefinitely. Suppose we construct a whole sequence L x , L 2 . L„,... 

of such Euler lines going through the point (x 0 , y 0 ), with the property 
that the length of the longest line segment making up L n approaches 0 
as n -> co. Let 9 n be the function with graph L n . Then this gives a family 
of functions 9 J, <p 2 ,. . . , <p„,. . . , all defined on the interval [a, b ], which 
is easily seen to be uniformly bounded and equicontinuous (why?). It 
follows from Arzela’s theorem that the sequence {9,,/} contains a uni¬ 
formly convergent subsequence 9 U) , 9 (2) , . . . , 9 (n) , . . . Let 

9 (x) =lim cp <n> (x). 

n~* 00 

Then clearly 

= To, 

so that the curve y = 9 (x) passes through the point (x 0 , j> 0 ). 

We now show that y = 9 (x) satisfies the differential equation (2) in 
the open interval (a, b). This means showing that, given any e > 0 and 
any points x ', x" e (a, b), we have 


< s 


X — X 

whenever \x" — x'\ is sufficiently small, or equivalently that 

9 U) (x") - 9 U) (x') 
x" - x' 


-fix ', cp(x')) 


< e 


(3) 
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whenever n is sufficiently large and \x" — x'\ is sufficiently small. Let 
/' = 'p(x'). Then, by the continuity of f, given any e > 0, there is a 
number tj > 0 such that 

fix',./) - s <f(x,y) </(*',/) + s 

whenever 

|x — x'\ < 2rj, |y — y'\ < AK-q. 

The set of points (x, y) satisfying these inequalities is a rectangle, which 
we denote by Q. Let N be so large that for all n> N, the length of the 
longest segment making up L n is less than tj and moreover 

!?(*)- <p«»>(x)| CJfcj. 

Then all the Euler lines L n with n > N lie inside the rectangle Q (why?). 
Suppose L n has vertices (a 0 , b 0 ), (a lt b x ) . (a k+1 , b k+1 ), where 13 

a 0 < x' < a x < < • - • < a k < x" < a k+x . 

Then 

<P (n, («i) - ? ( ' n) '(*') =/(«o. WOi - x'), 

/ n) ia i+ 1 ) - <P <n) («i) =f(au bi)(a {+1 - a { ) (i = 1, 2 ,..., k - 1), 

/ n) (x") - <? w (a k ) =f(a k , b k ){x" - a k ). 

Hence, if \x" — x'\ < t), 

[fix',/) ~ s](«i - x') < / n \a x ) - <p<">(x')< [/(x',/) + £](«!- x'), 

[fix',/) ~ s](«i+i - «i) < 9 ( "’(«i+i) - <P ( ” ) («i) 

< [fix', /) + e ](« i+ 1 - «;) (i = 1,2,. .. , k - 1), 

[/(*'>/) - s ]0" -«»)< 9 (,!) (^")- <P ( ”W< [/(^'>T') + s](^" - «*)• 
Adding these inequalities, we get 

[/(*',/) — s](x" — x') < cp (n, (x") — 9 < ” ) (x')< [/(x',/) + s](x" — x') 

if |x" — x'| < 7 ), which is equivalent to (3). i 

Remark. Different subsequences of a sequence of Euler lines may con¬ 
verge to different solutions of the differential equation (2). Hence the solution 
cp found in the proof of Theorem 5 may not be the unique solution of (2) 
passing through the point (x 0 , y 0 ). 

Problem 1. Let M be a totally bounded subset of a metric space R. Prove 
that the s-nets figuring in the definition of total boundedness of M can always 
be chosen to consist of points of M rather than of R. 


13 To be explicit, we assume that x" > x'. The case x" < x' is treated similarly. 
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Hint. Given an e-net for M consisting of points a u a 2 , . . . , a n e R, all 
within e of some point of M, replace each point a k by a point b k e M such 
that p (a k , b k ) <. s. 

Problem 2. Prove that every totally bounded metric space is separable. 

Hint. Construct a finite (l/«)-net for every n = 1,2,... Then take the 
union of these nets. 

Problem 3. Let M be a bounded subset of the space C [a Prove that the 
set of all functions 

Fix) = [j{t) dt 

with / e M compact. 

Problem 4. Given two metric compacta X and Y, let C XY be the set of 
all continuous mappings of X into Y. Let distance be defined in C XY by the 
formula 

p(/> g) = sup p(/(x), g(x)). (4) 

xeX 

Prove that C xr is a metric space. Let M xy be the set of all mappings of 
X into Y, with the same metric (4). Prove that C XY is closed in M xr . 

Hint. Use the method of Problem 1, p. 65 to prove that the limit of a 
uniformly convergent sequence of continuous mappings is itself a continuous 
mapping. 

Problem 5. Let X, Y and C XY be the same as in the preceding problem. 
Prove the following generalization of Arzela’s theorem: A necessary and 
sufficient condition for a set D c C XY to be relatively compact is that 
D be an equicontinuous family of functions, in the sense that given any e > 0, 
there exists a number S > Osuchthat p(x', /) < Simplies p(/(x'),/(x")) < e 
for all x', x" e X and all/ e D. 

Hint. To prove the sufficiency, show that D is relatively compact in 
M xy (defined in the preceding problem) and hence in C XY , since C XY is 
closed in M xr . To prove the relative compactness of D in M xr , first 
represent A as a union of finitely many pairwise disjoint sets E i such that 
x', x" e Ei implies p(x', x") < S. For example, let x ly .. . , x„ be a (§/2)-net 
for X, and let 

E t = S[x u 8 ] - U S[x 3 ., 8 ], 

j<i 

Then let y x . y p be an e-net in Y, and let L be the set of all functions 

taking the values/,, on the sets E Y Given any/e D and any x t e{x x ,..., x„}, 
let/,, e {/!, ...,/„} be such that p(/(x i ), /,.) < s and let g e L be such that 
g(.Xi) = Yi- Show that p(/(x), g(x)) < 2s, thereby proving that L is a finite 
2s-net for D in M XY . 
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12. Real Functions on Metric and Topological Spaces 

12.1. Continuous and uniformly continuous functions and functionals. Let T 
be a topological space, in particular a metric space. Then by a real function 
on T we mean a mapping of T into the space l? 1 (the real line). For example, 
a real function on Euclidean «-space it" is just the usual “function of n 
variables." Suppose T is a function space, i.e., a space whose elements are 
functions. Then a real function on T is called a functional 

Example 1. Let x(t) be a function defined on the interval [0, 1], let 
Si,, s n ) be a function of n + 1 variables defined for all real values 
of its arguments, and let u ) be a function of two variables defined for 

all t e [0, 1] and all real u. Then the following are all functionals: 

Ffx) = sup x(t), 

0<i<l 

F 2 (x) = inf x(t), 

0«i<l 

F 3 (x) = x(t 0 ) where t 0 e [0,1], 

F i(x) = <p[x(t 0 ), xif),..., x(f J], 

Ffx)=j 1 J[t, x(t)]dt 

F 6 (x) = x'(t 0 ) where t 0 e [0,1], 

T 7 (x) = J 0 Vl + x' 2 (t) dt, 

F s (x) = I'jxV)! dt. 

The functionals Fj, F 2 , F 3 , iq and F & are defined on the space C of all 
functions continuous on the interval [0, 1], On the other hand, F s is defined 
only for functions different iable at the point t 0 , F 7 is defined only for functions 
such that the expression v 1 + x' 2 (f) is integrable, and F a is defined only for 
functions with integrable |x'(f)|. 

Example 2. The functional F 1 is continuous on C, since 

p(x,y ) = sup |x — y\, |sup x — sup^vl < sup |x — y\. 

Example 3. The functional F 6 is discontinuous on C at any point x 0 where 
it is defined. In fact, let x(t) be such that x’(t 0 ) = 1 and |x(t)| < e, and let 
y — x o + x. Then y'(t 0 ) = x'(t 0 ) + 1 even though p (x 0 ,y) < e. However, 
Ft is continuous if it is defined on the space C (1) of all functions continuously 
differentiable on the interval [0, 1], with metric 

p(x, y) = sup [|x(t) - y(t )| + | x'(t) - /(t)|] 

0<«1 
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Example 4. The function F 7 is also discontinuous on C. In fact, let 

x 0 (t) = 0, x n (t) = - sin 2nnt. 

n 

Then 

?(x n , x 0 ) = - -> 0, 
n 

but F 7 (x n ) > 4 for all n while F 7 (x 0 ) = 1. Hence F 7 (x,j fails to approach 
F 7 (x 0 ) even though x n x 0 . 

The ordinary concept of uniform continuity generalizes at once to the 
case of arbitrary metric spaces: 

Definition 1. A real function f(x) defined on a metric space R is said 
to be uniformly continuous on R if, given any e > 0, there is a S > 0 such 
that p(xj, x 2 ) < S implies |/(x 1 ) —-/(x 2 )| < e for all x 7 , x 2 e R. 

The reader will recall from calculus that a real function continuous on a 
closed interval [a, b] is uniformly continuous on [a, b]. This fact is a special 
case of 

Theorem 1. A real function f continuous on a compact metric space R 
is uniformly continuous on R. 

Proof Suppose / is continuous but not uniformly continuous on R. 
Then for some positive e and every n there are points x„ and x' n in R such 
that 

p(x„, x'f) < - (1) 

n 

but 

I f(x n ) -f(x' n )\ > e. (2) 

Since R is compact, the sequence {x n } has a subsequence {x n J converging 
to a point x e R. Hence {x’ nk } also converges to x, because of (1). But 
then at least one of the inequalities 

I fix) ~f{x n ) I > |, |/(x) -f(x' ni ) | > | 

must hold for arbitrary k, because of (2). This contradicts the assumed 
continuity of/ at x. j 

12.2. Continuous and semicontinuous functions on compact spaces. As just 
shown, the theorem on uniform continuity of a function continuous on a 
closed interval generalizes to functions continuous on arbitrary metric 
compacta. There are other properties of functions continuous on a closed 


(why?). 
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interval which generalize to arbitrary compact spaces (not necessarily metric 
spaces): 

Theorem 2. A real function f continuous on a compact topological 
space T is bounded on 72 14 Moreover f achieves its least upper bound and 
greatest lower bound on T. 

Proof. A continuous real function on T is a continuous mapping of 
T into the real line R 1 . The image of T i n R 1 is compact, by Theorem 5, 
p. 94. But every compact subset of R 1 is bounded and closed (see p. 
101). Hence/is bounded on T. Moreover,/not only has a least upper 
bound and greatest lower bound on T, but actually achieves these bounds 
at points of T. 1 

Theorem 2 can be generalized to a larger class of functions, which we 
now introduce: 

Definition 2. A (real) function f defined on a topological space T is 
said to be upper semicontinuous at a point x 0 e T if, given any e > 0, there 
exists a neighborhood of x 0 in which f(x) < f(x 0 ) + s. Similarly, f is said 
to be lower semicontinuous at x 0 if, given any s > 0, there exists a neighbor¬ 
hood of x 0 in which f(x) > f (x 0 ) — s. 

Example 1. Let [x] be the integral part of x, i.e., the largest integer <x. 
Then f(x) = [a] is upper semicontinuous for all x. 

Example 2. Given a continuous function f suppose we increase the value 
/(x 0 ) taken by/at the point x 0 . Then/becomes upper semicontinuous at x 0 . 
Similarly, / becomes lower semicontinuous at x 0 if we decrease f(x 0 ). 
Moreover, / is upper semicontinuous if and only if —/ is lower semicon¬ 
tinuous. These facts can be used to construct many more examples of 
semicontinuous functions. 

In studying the properties of semicontinuous functions, it is convenient 
to allow them to take infinite values. If f(x 0 ) = +oo, we regard/as upper 
semicontinuous at x 0 . The function / is also regarded as lower semicon¬ 
tinuous at x 0 if, given any h > 0, there is a neighborhood of x 0 in which 
f(x) > h. Similarly, if/(x 0 ) = — oo, we regard / as lower semicontinuous 
at x„, and at the same time upper semicontinuous at x 0 if, given any h > 0, 
there is a neighborhood of x 0 in which f(x) < —h. 

We now prove the promised generalization of Theorem 2: 


14 A real function (or functional)/is said to be bounded on a set E if f(E) is contained 
in some interval [-C, CJ. 
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Theorem 2'. A finite lower semicontinuous function f defined on a 
compact topological space T is bounded from below. 

Proof. Suppose to the contrary that inf/(x) = — oo. Then there 
exists a sequence {xj such that f(x n ) < —n. Since T is compact, the 
infinite set E = {x l7 x 2 ,.. . , x n , . . .} has at least one limit point x 0 . 
Since/is finite and lower semicontinuous at x 0 , there is a neighborhood 
U of x 0 in which/ (x) > f(x 0 ) — 1. But then U can only contain finitely 
many points of E, so that x 0 cannot be a limit point of E. g 

Theorem 2". A finite lower semicontinuous function f defined on a 
compact topological space T achieves its greatest lower bound on T. 

Proof. By Theorem 2', inf/ (x) is finite. Clearly, there exists a 
sequence {x n } such that 

/(*„) < inf/(x) + - . 

n 

By the compactness of T, the set E = {x lt x 2 , . . . , x n , . . .} has at least 
one limit point x 0 . If f (x 0 ) > inf/, then, by the semicontinuity of/at x 0 , 
there is a neighborhood U of the point x 0 and a S > 0 such that/(x) > 
inf/ + § for all x e U. But then U cannot contain an infinite subset of 
E, i.e., x 0 cannot be a limit point of x 0 . It follows that f(x 0 ) = inf/. 1 

Remark. Theorems 2' and 2" remain true if the words “lower,” “below,” 
and “greatest” are replaced by “upper,” “above,” and “least.” The details 
are left as an exercise. 

We conclude this section with some useful terminology: 

Definition 3. Given a real function f defined on a metric space R, the 
(finite or infinite) quantity 

f( x o) — lim ( sup /(x)| 

e-M) l*6Sf(a: 0 ,e) J 

is called the upper limit of f at x 0 , while the (finite or infinite) quantity 
f(x 0 ) = lim ( inf f(x)\ 

s-*-0 {xeS(xo.e) j 

is called the lower limit of f at x 0 . The difference 
<>f (x 0 ) =f(x n ) —f(x 0 ), 

provided it exists, 15 is called the oscillation of f at x 0 . 


15 I.e., provided at least one of the numbers f (x 0 ),f(x 0 ) is finite. 
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12.3. Continuous curves in metric spaces. Instead of mappings of a metric 
space into the real line, we now consider mappings of a subset of the real 
line into a metric space. More exactly, let P = /( t ) be a continuous map¬ 
ping of the interval a < t < b into a metric space R. As t “traverses” the 
interval from a to b, the point P =f(t) “traverses a continuous curve” in 
the space R. Before giving a formal definition corresponding to this rough 
idea of a “curve,” we make two key observations: 

1) The order in which points are traversed will be regarded as an essential 
property of a curve. For example, the set of points shown in Figure 
14(a) gives rise to two distinct curves when traversed in the two distinct 
ways shown in Figures 14(b) and 14(c). Similarly, the function shown 
in Figure 15(a), defined in the interval 0 < t < 1, determines a “curve” 
filling up the segment 0 < y < 1 of they-axis, but this curve is traversed 
three times (twice upward and once downward) and hence is distinct 
from the segment 0 < y < 1 traversed just once from the point y = 0 
to the point y — 1 . 

2) The choice of the parameter t will be regarded as unimportant, 
provided a change in parameter does not change the order in which 
the points of the curve are traversed. Thus the functions shown in 
Figures 15(a) and 15(b) represent the same curve, even though a given 
point of the curve corresponds to different parameter values in the 
two cases. For example, the point A in Figure 15(a) corresponds to 
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two isolated points C and D on the l-axis, while in Figure 15(b) the 
same point A corresponds to an isolated point C and a whole line 
segment DE (note that the point on the curve does not move at all 
as t traverses the segment DE). 

We now give a formal definition of a curve, embodying these qualitative 
ideas. Two continuous functions 


P =/( 0 , 

defined on intervals 

a' < t' < b', 


P = g(t"), 


a" < t" < b 


H 


and taking values in a metric space R, are said to be equivalent if there exist 
two continuous nondecreasing functions 


t' = 9 ( 0 , t" = (K 0 , 


defined on the same interval 


such that 


a < 
9 (a) = a'. 


< b, 

9(b) = b', 


and 


<K«) = m = b" 


/(<p(0) = gOKO) 


for all t e [a, b\. 


It is easy to see that this relation of equivalence is reflexive (/is equivalent 
to/), symmetric (if/is equivalent tog, then g is equivalent to/)and transitive 
(if / is equivalent to g and g is equivalent to h, then / is equivalent to h). 
Hence the set of all continuous functions of the given type can be partitioned 
into classes of equivalent functions (cf. Sec. 1.4), and each such class is said 
to define a ( continuous ) curve in the space R. 

For each function P = f(t') defined on an interval [a', //], there is an 
equivalent function defined on the interval [a", b”] — [0, 1]. In fact, we need 
only make the choice 




t’ = 9 (t) = ( b ' — a')t + a’, t" = ^(t) = t. 

Thus every curve can be regarded as specified parametrically in terms of a 
function defined on the unit interval /= [0, 1]. By the same token, it is 
often convenient 16 to introduce the space C{I, R) of continuous mappings / 
of the interval I into the space R, equipped with the metric 

P(/> g) = sup p(f(t), g(t)), (3) 

o««i 

where p is the metric in the space R. 


Figure 15 


18 Cf. Problems 7-12. 
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Problem 1. Let the functionals F u F 2 , F z , F 4> F 6 and the space C be the 
same as on p. 108. Prove that 

a) F 2 , F 3 and F 5 are continuous on C; 

b) F 4 is continuous on C if the function <p is continuous in all its arguments; 

c) Fj is uniformly continuous on C. 

Define F 1( F 2 , F 3 and F 4 on a space larger than C. 

Problem 2. Let the functionals F 7 , F 8 and the spaces C, C (1) be the same 
as on p. 108. Prove that 

a) F 8 is discontinuous on C; 

b) F 7 and F 8 are continuous on C <1) . 

Problem 3. Let M be the space of all bounded real functions defined on 
the interval [a, b], with metric p(f,g) = sup |/ — g\. By the length of the 
curve 

y=f(x ) (a < x < b) 

is meant the functional 

L(f) = sup 2 \/(x, - *,-i) 2 + (fix,) -/(x,_i)) 2 , 

i =1 

where the least upper bound (which may equal +oo) is taken over all possible 
partitions of [a, b ] obtained by introducing points of subdivision x 0 , x x , 
x 2 ,. . . ,x n such that 

a — x 0 < x x < x 2 < • ■ ■ < x n = b. 

Prove that 

a) For continuous functions 


L(/) = lim 2 (x t - x^ x f + (/OO -/(x w )) 2 ; 

max \Xi —\ «=l 

b) For continuously differentiable functions 

L (f) = /Vi +f'\x) dx- 

c) The functional L(f) is lower semicontinuous on M. 

Problem 4. Let /,/ and co be the same as in Definition 3. Prove that 

a) / is upper semicontinuous; 

b) / is lower semicontinuous; 

c) /is continuous at x 0 if and only if — oo < f(x 0 ) = f(x 0 ) < oo, i.e., if 
and only if co/(x 0 ) = 0. 


SEC. 12 

Problem 5. Let K be a metric compactum and A a mapping of K into 
itself such that p(Ax, Ay) < p (x,y) if x / y. Prove that A has a unique 
fixed point in K. Reconcile this with Problem 1, p. 76. 

Problem 6. Let K be a metric compactum and {/„(*)} a sequence of 
continuous functions on K, increasing in the sense that 

fi(x) < fz(x) < • • • < f n (x) < • • ■ 

Prove that if {f n (x)} converges to a continuous function on K, then the 
covergence is uniform (Dini’s theorem). 

Problem 7. A sequence of curves {FJ in a metric space R is said to 
converge to a curve r in R if the curves F n and F have parametric repre¬ 
sentations 

P=fn(t) (0<?<1) 

and 

P =f(t) (0 < t < 1), 

respectively, such that 

lim p(/,/„) = 0, 

n~* oo 

where p is the metric (3) of the space C(7, R) introduced on p. 113. Prove 
that if a sequence of curves in a compact metric space R can be represented 
parametrically by an equicontinuous family of functions on [0, 1], then the 
sequence contains a convergent subsequence. 

Hint. Use Problem 5, p. 107. 

Problem 8. Let T be a curve in a metric space R, with parametric repre¬ 
sentation 

P=m ( a<t<b). 

By the length of F is meant the functional 

L ( r ) = Uf) = sup f p(/(fi-rX/Oi)), 

i=l 

where p is the metric in R and the least upper bound (which may equal +a>) 
is taken over all possible partitions of [a, b] obtained by introducing points 
of subdivision t 0 , t x , t 2 , . .. , t n , ... such that 

u — t 3 ^ t x <C. t 2 t n = b. 

Prove that L(F) is independent of the parametric representation of F. 
Suppose we choose a = 0, b = 1, thereby confining ourselves to parametric 
representations of the form 

P = f(t) (0<t< 1). 
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Prove that L(f) is then a lower semicontinuous functional on the space 
C(I,R) introduced on p. 113. Equivalently, prove that if a sequence of 
curves {T n } converges to a curve P, in the sense of Problem 7, then L(F) 
does not exceed the smallest limit point (i.e., the lower limit) of the sequence 
{ATJ}. 

Problem 9. Given a metric space R with metric p, let P be a curve in R 
of finite length S with parametric representation 

P=f(0 ( a<t<b ). 

Let s = o(T) be the length of the arc 

p =/(0 (a< t < T) 

(where T < b), i.e., the arc of P going from the “initial point” P a =f(a) 
to the “final point” P T — f(T). Then P has a parametric representation 
of the form 

P = g(s) (0 < , < S), 

where g(s ) = /(tp _1 (j)) if <p is one-to-one. Prove that 
P(?C*i)> SC**)) < ki - *sl. 

Hint. The length of an arc is no less than the length of the inscribed chord. 

Problem 10. In the preceding problem, let t = s/S. Then P has a para¬ 
metric representation 

P == F(t) = g(Sx) (0 < f < 1) 

in terms of a function F defined on the unit interval [0, 1], Prove that 
F satisfies a Lipschitz condition of the form 

p(P(^i)> ^2)) < S |tj — t 2 |. 

Suppose R is compact and let {PJ be a sequence of curves, all of length 
less than some finite number M. Prove that {P„} contains a convergent 
subsequence, where convergence of curves is defined as in Problem 7. 

Problem 11. Given a compact metric space R, suppose two points A and B 
in R can be joined by a continuous curve of finite length. Prove that among 
all such curves, there is a curve of least length. 

Comment. Even in the case where R is a “smooth” (i.e., sufficiently 
differentiable) closed surface in Euclidean 3-space, this result is not amenable 
to the methods of elementary differential geometry, which ordinarily deals 
only with the case of “neighboring” points A and B. 

Problem 12. Let <€ be the set of all curves in a given metric space R. 


Define the distance between two curves r i; P 2 e by the formula 

p(Pu P 2 ) = inf p(/i,/ 2 ), (4) 

where p is the metric (3) in the space C(I, R), and the greatest lower bound 
is taken over all possible representations 


k o 

II 

(0 < t < 1) 

(5) 

of r 2 and 



II 

(0 < t < 1) 

(6) 

of r 2 . Prove that the metric p makes into a metric space. 



Comment. The fact that p(r x , P 2 ) = 0 implies the identity of I\ and r 2 
follows from the (not very easily proved) fact that the greatest lower bound 
in (4) is achieved for a suitable choice of the parametric representations (5) 
and (6). 
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LINEAR SPACES 


13. Basic Concepts 

13.1. Definitions and examples. One of the most important concepts in 
mathematics is that of a linear space, which will play a key role in the rest 
of this book: 

Definition 1. A nonempty set L of elements x,y, z,. . . is said to be a 
linear space {or vector space) if it satisfies the following three axioms: 

1) Any two elements x, y e L uniquely determine a third element 
x + y e L, called the sum of x and y, such that 

a) x + y = _y + x {commutativity); 

b) (x + y) ~r r — x -j- (y -f- z) {associativity)', 

c) There exists an element 0 e L, called the zero element, with the 
property that x + 0 = x for every x e L; 

d) For every x e L there exists an element —x, called the negative 
of x, with the property that x + (— x) = 0; 

2) Any number a and any element x e L uniquely determine an element 
xx 6 L, called the product of a and x, such that 

a) a((3x) = (a[3)x; 

b) lx = x; 

3) The operations of addition and multiplication obey two distributive 
laws: 

a) (a + (3)x = aa + fix; 

b) a(x + y) = ax + ay. 


Remark. The elements of L are called “points” or “vectors,” while the 
numbers a, |3, . . . are often called “scalars.” If a is an arbitrary real number, 

L is called a real linear space, while if a is an arbitrary complex number, L 
is called a complex linear space. 1 Unless the contrary is explicitly stated, the 
considerations that follow will be valid for both real and complex spaces. 
Clearly, any complex linear space reduces to a real linear space if we allow 
vectors to be multiplied by real numbers only. 

We now give some examples of linear spaces, leaving it to the reader 
to verify in detail that the conditions in Definition 1 are satisfied in each case. 2 

Example 1. The real line (the set of all real numbers) with the usual 
arithmetic operations of addition and multiplication is a linear space. 

Example 2. The set of all ordered n-tuples 

X — (x^, X 2 » . . . , x n ) 

of real or complex numbers x 1; x 2 ,... , x n , with sums and “scalar multiples” 
defined by the formulas 

(x 1; x 2 ,..., x n ) + (y!,y 2 ,... ,y n ) = (xj +y lt x 2 +_y 2 ,... , x„ +y n ), 
a(Xj, x 2 .x„) = (ax 1; ax 2 .axj, 

is also a linear space. This space is called n-dimensional {vector) space, or 
simply n-space, denoted by R n in the real case and C” in the complex case. 
(Concerning the precise meaning of the term “n-dimensional,” see Sec. 
13.2.) 

Example 3. The set of all (real or complex) functions continuous on an 
interval [a, b], with the usual operations of addition of functions and multi¬ 
plication of functions by numbers, forms a linear space C [a 6] , one of the 
most important spaces in analysis. 

Example 4. The set l 2 of all infinite sequences 

x = (x 1; X 2 ,. . . , x k ,. ..) (1) 

of real or complex numbers x 1; x 2 , . . . , x k ,. . . satisfying the convergence 
condition 

f |x„| 2 < co, 

__ *:=i 

1 More generally, one can consider linear spaces over an arbitrary field. 

2 It will be noted that certain symbols like R n , C[ nit ], / 2 and m are used here with 
somewhat different meanings than in Sec. 5.1. The point is that there is no metric here, 
at least for the time being, while on the other hand, sums and scalar multiples of vectors 
were not defined in Chaps. 2 and 3. 
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equipped with operations 

( X l> *2> •••>**>•••) + •••»>'*»•••) 

= ( x 1 + Ji: x i + J2> • • ■ > x k + y*., . . .), 

*(*i> X 2 , . . . , x*, . . .) = (ax x , ax 2 , . . . , ax*, . . .), (2) 

is a linear space. The fact that 


implies 


2 W 2 < oo, 


fc=l 


00 


2lT*| 2 < oo 


2 I** + Til 2 < oo 


k= 1 


is an immediate consequence of the elementary inequality 


(** + Ti) 2 < 2 (xl + yl). 

Example 5. Let c be the set of all convergent sequences (1), c 0 the set of 
all sequences (1) converging to zero, m the set of all bounded sequences, 
and R a the set of all sequences (1). Then c, c„, m and R" J are all linear spaces, 
provided that in each case addition of sequences and multiplication of 
sequences by numbers are defined by (2). 

Since linear spaces are defined in terms of two operations, addition 
of elements and multiplication of elements by numbers, it is natural to 
introduce 


Definition 2. Two linear spaces L and L* are said to be isomorphic if 
there is a one-to-one correspondence x<-> x* between L and L* which 
preserves operations, in the sense that 


x<—>x*, y<->y* 

(where x,y e L, x*, y* e L*) implies 


and 

(a an arbitrary number). 


x + y<-> x* + y* 
ax<-> ax* 


Remark. It is sometimes convenient to regard isomorphic linear spaces 
as different realizations” of one and the same linear space. 


13.2. Linear dependence. We say that the elements x,y,...,w of a linear 
space L are linearly dependent if there exist numbers a, (3, . . . , X, not all zero, 
such that 3 

_ ax + H-+ ~kw = 0. (3) 

3 The left-hand side of (3) is called a linear combination of the elements x, y, . . ., w. 


If no such numbers exist, the elements x, y,. . . , w are said to be linearly 
independent. In other words, the elements x, y,... , w are linearly inde¬ 
pendent if and only if (3) implies 

a=p = -’- = X = 0. 

More generally, the elements x,y, . . . belonging to some infinite set E c: L 
are said to be linearly independent if the elements belonging to every finite 
subset of E are linearly independent. 

A linear space L is said to be n-dimensional (or of dimension n) if n linearly 
independent elements can be found in L, but any n -f- 1 elements of L are 
linearly dependent. Suppose n linearly independent elements can be found 
in L for every n. Then L is said to be infinite-dimensional, but otherwise L 
is said to be finite-dimensional. Any set of n linearly independent elements of 
an n-dimensional space L is called a basis in L. 

Remark. The typical course on linear algebra deals with finite-dimensional 
linear spaces. Here, however, we will be primarily concerned with infinite¬ 
dimensional spaces, the case of greater interest from the standpoint of 
mathematical analysis. 

13.3. Subspaces. Given a nonempty subset L' of a linear space L, suppose 
L is itself a linear space with respect to the operations of addition and multi¬ 
plication defined in L. Then L' is said to be a subspace (of L). In other 
words, we say that L' e Lisa subspace if x e L', y e L' implies ax + (3y 6 L' 
for arbitrary a and [3. The “trivial space” consisting of the zero element alone 
is a subspace of every linear space L. At the opposite extreme, L can always 
be regarded as a subset of itself. By a proper subspace of a linear space L, 
we mean a subspace which is distinct from L itself and contains at least 
one nonzero element. 

Example 1. Let L be any linear space, and x any nonzero element of L. 
Then the set {Ax} of all scalar multiples of x, where A ranges over all (real or 
complex) numbers is obviously a one-dimensional subspace of L, in fact a 
proper subspace if the dimension of L exceeds 1. 

Example 2. The set P {afi] of all polynomials on [a, b] is a proper subspace 
of the set C [M] of all continuous functions on [a, b]. Like C [a<h] itself, P [aM 
is infinite-dimensional. At the same time, is itself a proper subspace of 
the set of all functions on [ a, b], both continuous and discontinuous. 

Example 3. Each of the linear spaces 4, e„, c, m and (in that order) 
is a proper subspace of the next one. 

Given a linear space L, let {x a } be any nonempty set of elements x a e L. 
Then L has a smallest subspace (possibly L itself) containing {xj. 4 In fact, 

4 Here we use curly brackets in the same way as in footnote 10, p. 92. 
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there is at least one such subspace, namely L itself. Moreover, it is clear 
that the intersection of any system of subspaces {. L y } is itself a subspace, 
since if L* = fj L y and x,y e L*, then ax + fry e L* for all a and (3 (why?). 

Y 

The smallest subspace of L containing the set {xj is then just the intersection 
of all subspaces containing {x a }. This minimal subspace, denoted by L((xf), 
is called the ( linear) subspace generated by {x a } or the linear hull of{x x }. 

13.4. Factor spaces. Let L be a linear space and L' a subspace of L. 
Then two elements x,y eL are said to belong to the same ( residue) class 
generated by L' if the difference x — y belongs to L'. The set of all such 
classes is called the factor space (or quotient space) of L relative to L', denoted 
by LjL'. The operations of addition of elements and multiplication of elements 
by numbers can be introduced in a factor space LjL' in the following natural 
way: Given two elements of LjL', i.e., two classes ? and r h we choose a 
“representative” from each class, say x from H, and y from r : . We then 
define the sum i; + t] of the classes E, and tj to be the class containing the 
element x + y, while the product of of the number a and the class £ is 
defined to be the class containing the element ax. Here we rely on the fact 
that the classes % + yj and of are independent of the choice of the “repre¬ 
sentatives” x and y (why?). 

Theorem 1. Every factor space LjL', with operations defined in the 
way just described, is a linear space. 

Proof. We need only verify that LjL' satisfies the three axioms in 
Definition 1. This is almost trivial (give the details). 1 

Let L be a linear space and L' a subspace of L. Then the dimension of 
the factor space LjL' is called the codimension of L' in L. 

Theorem 2. Let L' be a subspace of a linear space L. Then L' has finite 
codimension n if and only if there are linearly independent elements x x ,... , 
x n in L such that every element x e L has a unique representation of the 
form 

x = -|-1- «„x„ +y, (4) 

where o x , . . . , o n are numbers andy e L'. 

Proof. Suppose every element x e L has a unique representation of the 
form (4). Given any class E, e LjL', let x be any element of and let 
5*; be the class containing x k (k = \ , ... ,n). Then (4) clearly implies 

5 = + • •' + oc n £, n . 

Hence .. . , E„ is a basis for LjL' (the linear independence of E 1( ... , 
follows from that of x lt . . . , x„). In other words, LjL' has dimension 
n, or equivalently L' has codimension n. 


Conversely, suppose L' has codimension n, so that LjL' has dimension 
n. Then LjL' has a basis E x ,... , E„. Given any x e L, let \ be the class 
in LjL' containing x. Then 

£ = *i5i + • • • + 

for suitable numbers a u . .. , a„. But this means that every element in 
E, in particular x, differs only by an element yeL' from a linear com¬ 
bination of elements x x ,.. . , x„ where x k is any fixed element of 
S* (k = 1,... ,«), i.e., 

x = a 1 x 1 4-+ a„x„ +y (y e L') (5) 

(the linear independence of x 1; ... , x n follows from that of Ei,. .. , £„). 
Suppose there is another such representation 

x = ajxj 4-b a! n x n 4- / (/ e L'). (5') 

Then, subtracting (5') from (5), we get 

0 = (o 1 — aj)Xj 4- • ‘ ‘ 4- (a„ — a'„) 4- y" O'" e L'), 

and hence 

0 = (a x k x )^x “h * “h (x„ &«)£ji> 

where in the last equation 0 means the class containing the zero element 

of L, i.e., the space L' itself. But \ x .are linearly independent, 

and hence o 1 = a(,... , a n = x' n . 1 

13.5. Linear functionals. A numerical function/defined on a linear space 
L is called a functional (on L). 5 A functional/is said to be additive if 

fix + y) = fix) 4 -fiy) 

for all x, y e L and homogeneous if 

/(ax) = a fix) 

for every number a. A functional defined on a complex linear space is called 
conjugate-homogeneous if 

/(ax) = a/(x) 

for every number a, where a is the complex conjugate of a. An additive 


6 The word “functional” has already been used in a somewhat different sense in Sec. 
12.1, where a functional means a real function defined on a function space (topological 
or metric). Later on, we will deal with linear spaces which are also metric spaces and 
have functions as their elements. The two uses of the word “functional” will then coincide 
(if we allow complex-valued functionals). 
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homogeneous functional is called a linear functional, while an additive 
conjugate-homogeneous functional is called a conjugate-linear functional. 

Example 1. Let R” be teal rc-space, with elements a: = (iq, . . . , x n ), and 
let a — («!,.. . , a n ) be a fixed element of R n . Then 

n 

fix) = 2 a k x k 

k= 1 

is a linear functional on R n . Similarly, 

n 

f(x)=la k x k 

k =1 

is a conjugate-linear functional on complex n-space C n . 

Example 2. Consider the integral 

l(x ) = ]x(t) dt, 

Ja 

or more generally 

l(x) = dt, 

•la 

where cp(t) is a fixed continuous function on [a, b]. It follows at once from 
elementary properties of integrals that I(x) is a linear functional. Similarly, 
the integral 

I(x) = 6 x(t) dt, 

•la 

or more generally 

I(x) — f ” x(t)y(t) dt, 

* a 

is a conjugate-linear functional on 

Example 3. Another kind of linear functional on the space C [0i6] is the 
functional 

&*„(*) = *Oo), 

which assigns to each function x(t) e C [0 6] its value at some fixed point 
t 0 e [a, b]. In mathematical physics, particularly in quantum mechanics, this 
functional is often written in the form 

<3 <0 O) = j*x(t)d(t - /„) dt, 

where 8 (t — r„) is a “fictitious” or “generalized” function, called the (Dirac) 
delta function , which equals zero everywhere except at t = 0 and has an 
integral equal to 1.® As we will see in Sec. 20.3, the delta function can be 


represented as the limit, in a suitable sense, of a sequence of “true” functions 
<p„, each vanishing outside of some ^-neighborhood of the point t = 0 and 
satisfying the condition 

J a V(0 dt = 1 

(e B —0 as n—y oo). 

Example 4. Let n be a fixed positive integer, and let 

X (A'i, X 2 , • ■ ■ , x k , . . .) 

be an arbitrary element of / 2 . Then 

fn(.X) X n 

is obviously a linear functional on / 2 . The same functional can be defined 
on other spaces whose elements are sequences, e.g., on the spaces c 0 , c, m 
and R 00 considered in Example 5, p. 120. 


13.6. The null space of a functional. Hyperplanes. Let /be a linear func¬ 
tional defined on a linear space L. Then the set L t of all elements x e L such 
that 

fix) = 0 

is called the null space of /. It will be assumed that /is nontrivial, i.e., that 
f(x) ^ 0 for at least one (and hence infinitely many) x e L, so that the set 
L — L f is nonempty. Obviously L f is a subspace of L, since x,y e L f implies 

/(ax + ?>y) = a fix) + P/O) = 0. 

Theorem 3. Let x 0 be any fixed element ofL — L f . Then every element 
x e L has a unique representation of the form 

x = ax 0 + y, 

where y e L f . 

Proof. Clearly/(x 0 ) ^ 0, and in particular x 0 # 0. There is no loss 
of generality in assuming that /(x 0 ) = 1, since otherwise we need only 
replace x 0 by x 0 //(x 0 ), noting that 


Given any x e L, let 
where 

Then y e L f , since 



fix,) _ x 
fix 0 ) 


y = x — ax 0 , 


a — fix). 


fiy) =fi x - a *o) = fix) - a/(x 0 ) = fix) — a = 0. 


8 Clearly, no “true” function can have these properties! 
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Thus 


x = ax„ + y (y e Lf). (6) 

Moreover, the representation (6) is unique. In fact, if there is another 
such representation 

x = a'x 0 + y' (yeLf), (6') 


then, subtracting (6') from (6), we get 

(a - a')x„ = / - y. 

If a = a', then obviously y’ = y. On the other hand, if a a', then 


x 0 



eL f , 


contrary to the choice of x 0 . 1 

Corollary 1. Two elements x x and x 2 belong to the same class gener¬ 
ated by L t if and only if f(x x ) = f(x 2 ). 

Proof. It follows from 

*i =/(*i)*o +Ti> 
x 2 = /(x 2 )x 0 + J 2 

that 

Xi- x 2 = (f(x J.) -/(x 2 ))x 0 + (y 1 - p 2 ). 

Hence x x — x 2 e Z, r if and only if the coefficient of x 0 vanishes. 1 


Corollary 2. L f has codimension 1 . 

Proof. Given any class i; generated by L f , let x be any element of E, 
and choose /(x)x 0 = ax„ as the “representative” of E,. By Corollary 1, 
this representative is unique, and there is obviously a nonzero class 
since x 0 # 0 and f(x) # 0 for some x e L. Moreover, given any two 
distinct classes 2, and r\ with representatives o«- 0 and (3x 0 , respectively, 
we have 

P(ax 0 ) — a((3x 0 ) = 0 

and hence 

(3E, — ocr] = 0, 

where at least one of the numbers a, (3 is nonzero (why?). Therefore any 
two distinct elements of LjL t are linearly dependent. It follows that 
LjL f is one-dimensional, i.e., L f has codimension 1. | 


Proof. Again let x„ be such that/(x„) = 1. Then g(x 0 ) f 0. In fact, 
x=f(x)x 0 +y (yeLf), 

and hence 

g(x) =f(x)g(x 0 ) + g(y) = /(x)g(x 0 ), 

since L g = L,. But then g(x 0 ) = 0 would imply thatg is trivial, contrary 
to hypothesis. It follows that 

g(x) = g(x 0 )f(x), 

i.e.,g(x) is proportional to / (x) with constant of proportionality g(x 0 ). 1 

Given a linear space L, let L' c: L be any subspace of codimension 1. 
Then every class in L generated by L' is called a hyper plane “parallel to L” 
(in particular, L' itself is a hyperplane containing 0, i.e., “going through the 
origin”). In other words, a hyperplane M' parallel to a subspace L' is the 
set obtained by subjecting L' to the parallel displacement (or shift) determined 
by the vector x 0 e L, so that 7 

M' = L' + x 0 = {x:x — x 0 + y,y e L'}. 

It is clear that M' — L' if and only if x 0 6 L'. We can now give a simple 
geometric interpretation of linear functionals: 

Theorem 4. Given a linear space L, let fbe a nontrivial linear functional 
on L. Then the set M f = (x :/(x) = 1} is a hyperplane parallel to the null 
space L f of the functional. Conversely, let M' — L' + x 0 (x 0 $ L') be any 
hyperplane parallel to a subspace L' c: L of codimension 1 and not passing 
through the origin. Then there exists a unique linear functionalf on L such 
that M' = (x:/(x) = 1}. 

Proof. Given /, let x„ be such that /(x 0 ) = 1 (such an x 0 can always 
be found). Then, by Theorem 3, every vector x 6 M f can be represented 
in the form x = x 0 + y, where y e L f . 

Conversely, given M' — L' + x 0 (x 0 £ L'), it follows from Theorem 2 
and its proof that every element x e L can be uniquely represented in the 
form x = ax 0 + y, where y e L'. Setting/(x) = a, we get the desired 
linear functional. The uniqueness of / follows from the fact that if 
g(x) = 1 for x e M', then g(y) = 0 for y e L' (why?), so that 
g(ax 0 + y) = a = /(ax„ + y). g 

Remark. Thus we have established a one-to-one correspondence be¬ 
tween the set of all nontrivial linear functionals on L and the set of all 
hyperplanes in L which do not pass through the origin. 


Corollary 3. Two nontrivial linear functionals f and g with the same 
null space are proportional. 


7 The expression on the right is shorthand for the set of all x such that x = x„ + y, 
yeL' (the colon is read “such that”). Similarly, {x :f(x) = 1} is the set of all x such that 
f(x) — 1, and so on. 
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Problem 1. Prove that the set of all polynomials of degree n — 1 with 
real (or complex) coefficients is a linear space, isomorphic to the ^-dimensional 
vector space R n (or C”). 

Problem 2. Verify that R n and C* are n-dimensional, as anticipated by the 
terminology in Example 2„ p. 119. 

Problem 3. Verify that the spaces C [a b] , / 2 , c, c 0 , m and R°° are all 
infinite-dimensional. 

Problem 4. Given a linear space L, a set {x a } of linearly independent 
elements of L is said to be a Hamel basis (in L) if the linear subspace generated 
by {xj coincides with L. Prove that 

a) Every linear space has a Hamel basis; 

b) If {xj is a Hamel basis in L, then every vector x e L has a unique 
representation as a finite linear combination of vectors from the set 
{■V*} > 

c) Any two Hamel bases in a linear space L have the same power 
(cardinal number), called the algebraic dimension of L ; 

d) Two linear spaces are isomorphic if and only if they have the same 
algebraic dimension. 

Problem 5. Let L' be a /t-dimensional subspace of an n-dimensional linear 
space L. Prove that the factor space LjL' has dimension n — k. 

Problem 6. Let ff u ...,/„ be linear functionals on a linear space L such 
that /j(x) = ••■•'= /„(x) = 0 implies /(x) = 0. Prove that there exist con¬ 
stants a t ,... ,a n such that 

/(*) = 2 >*/*(*) 

k= 1 

for every x e L. 


14. Convex Sets and Functionals. The Hahn-Banach Theorem 

14.1. Convex sets and bodies. Many important topics in the theory of 
linear spaces rely on the notion of convexity. This notion, stemming from 
intuitive geometric ideas, can be formulated purely analytically. Given a 
real linear space L, let x and y be any two points of L. Then by the ( closed ) 
segment in L joining x and y we mean the set of all points in L of the form 
ax + where a, (3 > 0 and a + (3 = 1. Such a segment minus its end 
points x and y is called an open segment. By the interior of a set M c L, 
denoted by /(M), we mean the set of all points xe M with the following 
property: Given any y e L, there exists a number e = s(y) > 0 such that 
x + ty e M if |t| < s. 


Definition 1 . A set M c L is said to be convex if whenever it contains 
two points x and y, it also contains the segment joining x and y. 

Definition 2. A convex set is called a convex body if its interior is 
nonempty. 


Example 1. The cube, ball, tetrahedron and half-space are all convex 
bodies in three-dimensional Euclidean space R 3 . On the other hand, the 
line segment, plane and triangle are convex sets in R 3 , but not convex bodies. 

Example 2. As usual, let q«. be the space of all functions continuous on 
the interval [a, b], and let M be the subset of C [aib] consisting of all functions 
satisfying the extra condition 

1/(01 < 1 . 

Then M is convex, since 

1/(01 < l, lg(0l < l 


together with a,(B> 0, a + (3 = 1 implies 

Wfi 0 + P#(0l < a + p = 1. 

Example 3. The closed unit sphere in / 2 , i.e., the set of all points x = 
(xj, x 2 , ... , x„,...) such that 

i>® < 

71=1 

is a convex body. Its interior consists of all points x = (xj, x 2 . x„,. ..) 

satisfying the condition 

71 = 1 


Example 4. The Hilbert cube n (see Example 5, p. 98) is a convex set in 
4, but not a convex body. In fact, 


l*«l < TTT (n = 1, 2,. ..) 


if x e II. Let 


To 


and suppose x + ty 0 e IT, i.e., 


( X ’ 2’'' ’ n ’ * )’ 


Then 


x 0 + 


< 


t 


, t 


< 

X n - 

n 


n 


, , . _1 _ 1 _ 

^ 2»-l ' 2»-l 2 n—2 


for all n = 1,2,... , which implies t = 0. Therefore the interior of IT is 
empty. 
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Theorem 1. If M is a convex set, then so is its interior I(M). 

Proof. Suppose x, y e I(M), and let z = ax + (3y, a, [3 > 0, a + 

(3 = 1. Then, given any a e L, there are numbers s x > 0, s 2 > 0 such 
that the points x + 1 x a, y + / 2 « belong to M if |r x | < e x , |* 2 | < s 2 . 
Therefore 

ol(x + ta) + $(y + ta) = z + ta 
belongs to M if \t\ < e = min {s x , e 2 }, i.e., z e I{M). 1 

Theorem 2. The intersection 

m = n M a 

a 

of any number of convex sets M a is itself a convex set. 

Proof. Let x and y be any two points of M. Then x and y belong to 
every M a , and hence so does the segment joining x and y. But then the 
segment joining x and y belongs to M. | 

Given any subset A of a linear space L, there is a smallest convex set 
containing A, i.e., the intersection of all convex sets containing A (there 
is at least one convex set containing A, namely L itself). This minimal 
convex set containing A is called the convex hull of A. For example, the 
convex hull of three noncollinear points is the triangle with these points as 
vertices. 

14.2. Convex functionals. Next we introduce the important concept of a 
convex functional : 

Definition 3. A functional p defined on a real linear space L is said to 
be convex if 

1) p(x) > 0 for all x £ L ( nonnegativity ); 

2) p( r J.x) = a.p(x) for all x e L and all a > 0; 

3) p{x + j) < p(x) + p(y) for all x,yeL. 

Remark. Here, unlike the case of linear functionals, we do not assume 
that p(x) is finite for all xe L, i.e., we allow the case where p(x) = +oo 
for some x e L. 

Example 1. The length of a vector in Euclidean «-space R n is a convex 
functional. The first and second conditions are immediate consequences of 
the definition of length in R n (length is inherently nonnegative), while the 
third condition means that the length of the sum of two vectors does not 
exceed the sum of their lengths (the triangle inequality). 


Example 2. Let M be the space of bounded functions of x defined on some 
set S, and let ■?„ be a fixed point of S. Then 

P,.(x) = l*(s 0 )| 

is a convex functional. 

Example 3. Let m be the space of bounded numerical sequences x = 
(xj, x 2 ,.. . , x,„ . ..). Then the functional 

P(x) = sup |x*| 

k 

is convex. 

14.3. The Minkowski functional. Next we consider the connection be¬ 
tween convex functionals and convex sets: 

Theorem 3. Ifp is a convex functional on a linear space L and k is any 
positive number, then the set 

E = {x:p(x) < k} 

is convex. If p is finite, then E is a convex body with interior 
1(E) = (x:p(x) < k} 

(so that in particular 0 e 1(E)). 

Proof. If x, y e E, a, (3 > 0, a + (3 = 1, then 

p(ax + (3 y) < <xp(x) + (3 p(y) < k, 

i.e., E is a convex set. Now suppose p is finite, and let p(x) < k, t > 0, 
y e L. Then 

p(x ± ty) < p(x) + tp(±y). 

If p(—y) = p(y) = 0, then x ± ty e E for all t. On the other hand, if at 
least one of the numbers p(y), p(—y) is nonzero, then x ± ty £ E if 

f<--. 1 

max { p(y), p(~y)} 

Suppose we choose a definite value of k, say k — 1. Then every finite 
convex functional p uniquely determines a convex body E in L, such that 
0 e 1(E). Conversely, suppose £ is a convex body whose interior contains 
the point 0, and consider the functional 

Pe(x) = inf jr: ~eE, r > oj, (1) 

called the Minkowski functional of the convex body E. Then we have 
Theorem 4. The Minkowski functional (1) is finite and convex. 
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Proof. Given any x e L, the element xjr belongs to E if r is suffi¬ 
ciently large (why?), and hence p E (x) is nonnegative and finite. Clearly 
Pe( 0) = 0. If a > 0, then 

p E ( ax) = inf |r > 0: — e je| = inf jar' > 0: — 6 isj 

= a inf |r' > 0: * e fij = a.p E (x). (2) 

Next, given any s > 0 and any x u x 2 e L, choose numbers r, (i = 1,2) 
such that 

Pn( x i) < r i < Psi x i) + e. 

Then e E. If r — r x + r 2 , then 

x x -f x 2 _ rjXi r 2 x 2 

r rr x rr 2 

belongs to the segment with end points x 1 fr l and x 2 /r 2 . Since E is convex, 
this segment and hence the point (x l + x 2 )/r belongs to E. It follows that 

Pe ( x i + * 2 ) < r = r x + r 2 < p E {x x ) + p E (x 2 ) + 2s 
or 

Psi x i + * 2 ) < Pe( x i) +Pe( x 2 ). (3) 

since s is arbitrary. Together (2) and (3) imply that p E (x) is convex. | 

13.4. The Hahn-Banach theorem. Given a real linear space L and any 
subspace L 0 <= L, let f 0 be a linear functional defined on L„. Then a linear 
functional/defined on the whole space L is said to be an extension of the 
functional f 0 if 

/ (x) = f 0 {x) for all x e L 0 . 

A problem frequently encountered in analysis is that of extending an arbitrary 
linear functional, originally defined on some subspace, onto a larger space. 
A central role in problems of this kind is played by 

Theorem 5 ( Hahn-Banach ). Let p be a finite convex functional defined 
on a real linear space L, and let L 0 be a subspace of L. Suppose / 0 is a 
linear functional on L 0 satisfying the condition 

/oW < P(x) ( 4 ) 

on L 0 . Then f 0 can be extended to a linear functional on L satisfying (4) 
on the whole space L. More exactly, there is a linear functional f defined 
on L and equal to / 0 at every point of L 0 , such that f (x) < p(x) on L. 
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Proof. Suppose L 0 L, since otherwise the theorem is trivial. We 
begin by showing that / 0 can be extended onto a larger space L without 
violating the condition (4). Let z be any element of L — L 0 , and let L 
be the subspace generated by L 0 and the element z, i.e., the set of all linear 
combinations 

x + tz (x e L 0 ). 

If/is to be an extension of f 0 onto L, we must have 

/(x + tz) =/ 0 (x) + tf(z) 
or 

/(x + tz) =/o(x) + tc (5) 

after setting/(z) = c. We now choose c such that the “majorization” 
condition f(x + tz) < p(x + tz) is satisfied, i.e., such that 

/ 0 (x) + tc < p(x + tz). 

We can write this condition as 


if t > 0, and as 


/.(i)+c<P0 + c) 

+ ‘)-/•()) 

/„0)+ «>-,(-£-z) 


if t < 0. Hence we want to show that there is always a value of c satisfying 
(6) and (7). Let y' and y" be arbitrary elements of L 0 . Then it follows 
from the inequality 

/oO") -/oO') < piy" - /) = p((y” + z) - O' + z)) 

< piy" + z) + p(—y — z) 

that 

-f,iy") + piy" + z) > -foiy') - pi-y' - *)■ ( 8 ) 

Let 

o' = sup [-/,,(/) - pi-y' - z)], 

V' 

c " = inf [-/o(y") + Piy" + z)]. 


c > c , 


Then 
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by (8) and the fact that y' and y" are arbitrary. Hence, choosing c such 
that 

c" > c > c', 

we find that the functional/defined on L by the formula (5) satisfies the 
condition /(x) < p(x). Thus we have succeeded in showing that if/„ is 
defined on a subspace L 0 cz L and satisfies (4) on L 0 , then / 0 can be 
extended onto a larger subspace L with the condition (4) being preserved. 

To complete the proof, suppose first that L is generated by a countable 
set of elements x lt x 2 , ■ . . , x n , ... in L. Then we construct a functional 
on L by induction, i.e., by constructing a sequence of subspaces 

L w = {L, Xj}, L (2) = {L a) , x 2 },.. . , 

each contained in the next. Here {L (,c >, Xj. +1 } denotes the minimal linear 
subspace of L containing L m and x k+l . This process extends the 
functional onto the whole space L, since every element x e L belongs to 
some subspace L m . 

More generally, i.e., in the case where there is no countable set 
generating L, the theorem is proved by applying Zorn’s lemma (see 
p. 28). The set of all possible extensions of the functional/ 0 satisfying 
the majorization condition (5) is partially ordered, and each linearly 
ordered subset SF 2 c 2F has an upper bound. This upper bound is the 
functional which is defined on the union of the domains of all functionals 
/ e and coincides with every such functional / on the domain of /. 

Hence, by Zorn’s lemma, has a maximal element/. Clearly/must be 
the desired functional extending /„ onto L and satisfying the condition 
pipe) < /(x), since otherwise we could extend /in turn, by the method 
described above, from the proper subspace on which it is defined onto a 
large subspace, thereby contradicting the maximality of/. | 

Next we turn to the case of complex linear spaces: 

Definition 3'. A functional p defined on a complex linear space L is 
said to be convex if 

1) p(x) > 0 for all x e L {nonnegativity ); 

2) p(ccx) = |a| p(x) for all x £ L and all complex a; 

3) p(x + y) < p(x) + p{y) for all x,yeL. 

The corresponding complex version of the Hahn-Banach theorem is 
given by 

Theorem 5'. Let p be a finite convex functional, defined on a complex 
linear space L , and let L 0 be a subspace of L. Suppose /„ is a linear 
functional on L 0 satisfying the condition 

l./o(*)l < P(x) (4') 


on L 0 . Then f , can be extended to a linear functional on L satisfying (4') 
on the whole space L. 

Proof. Let L R and L m denote the spaces L and L 0 , regarded as real 
linear spaces. Clearly p is a finite convex functional on L R , while 

/«(*) = Re/oW 

is a real linear functional on L m satisfying the condition 

I/obWI < p(x) 

and hence (a fortiori) the condition 

foit(x) < p{x). 

By Theorem 5, there exists a real linear functional f R defined on all of L n , 
satisfying the conditions 

f R (x) < p(x) if x e L r (= L), 

/kW — /or(X) if x e L or (== L 0 ). 

Clearly 

-/*(*) = fd- x ) < P(-x) = P(x)> 

and hence 

[/ E (x)| < p(x) if xeL R {= L). (9) 

We now define the functional 

fix) =/ B (x) - if R (ix) 

on L, using the fact that L is a complex linear space in which multipli¬ 
cation by complex numbers is defined. It is easily verified that/is a com¬ 
plex linear functional on L such that 

f(x)=fo(x) if xeL 0 , 

Re/(x)=/ B (x) if xeL. 

Finally, to show that |/(x)| < p(x) for all x e L, suppose to the contrary 
that |/(x 0 )| > p(x 0 ) for some x 0 6 L. Writing/(x 0 ) = pe* 9 where p > 0. 
we set jo = e _ltp -Xo- Then 

fniyo) = Re/(To) = Re [e^/Wl = ? > p{x 0 ) = p(y 0 ) 
which contradicts (9). 1 

14,5. Separation of convex sets in a linear space. Given a real linear space 
L, let M and N be two subsets of L. Then a linear functional/defined on 
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L is said to separate M and N if there exists a number C such that 
/(x) > C if x e M, 

f (x) < C if x e N. 

It follows at once from this definition that 

1) A linear functional / separates two sets M and N if and only if it 
separates M — N = {z:z = x — y, x e M, y e N} and {0}, i.e., the set 
consisting of all differences x — y where x e M, y e N and the set 
whose only element is 0 (note that the minus sign in M — N does not 
have the usual meaning of a set difference); 

2) A linear functional / separates two sets M and N if and only if it 
separates the sets M — x 0 = {z:z = x — x 0 , x e Mj and N — x 0 = 
{z:z = y — x,y e N) for every x„ e L. 

The following theorem on the separation of convex sets in a linear space 
has numerous applications and is an easy consequence of the Hahn-Banach 
theorem: 

Theorem 6. Let M and N be two disjoint convex sets in a real linear 
space L, where at least one of the sets, say M, has a nonempty interior 
(i.e., is a convex body). Then there exists a nontrivial linear functionalf on 
L separating M and L. 

Proof. There is no loss of generality in assuming that the point 0 
belongs to the interior of M, since otherwise we need only consider the 
sets M — x g — {z:z = x — x„, x e M} and N — x 0 — {z:z = y — x 0 , 
y G N}, where x 0 is some point of the interior of M. Let y 0 be a point of 
N. Then the point — y g belongs to the interior of the set M — N = 
{z:z = x — y, x e M,y e N}, and 0 belongs to the interior of the set 
M — N + y 0 = {z:z — x — y + y 0 , x e M, y e TV}. Since M and N are 
disjoint, we have 0 $ M — N,y 0 £ M — N + y 0 . Letp be the Minkowski 
functional for the set M — TV + y 0 . Then p(y 0 ) > 1 since y 0 $ M — N 
+ y 0 . Consider the linear functional 

/o(«To) = “/’(jo) 

defined on the one-dimensional subspace of L consisting of all elements 
of the form a y 0 . Clearly / 0 satisfies the condition 

/o(«To) < ^(«7o)> 

since 

/Kayo) = a/’O'o) if a > 0, 

while 

/o(ay 0 ) = a/oOo) < 0 < p(a.y 0 ) if a < 0. 

Hence, by the Hahn-Banach theorem, the functional/„ can be extended 
to a linear functional / defined on the whole space L and satisfying the 
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condition/(y) < p(y) on L. It follows that f(y) < 1 ify e M — N + y 0 , 
while at the same time f(y 0 )> 1, i.e.,/separates the sets M — N + y 0 
and {y 0 }. Therefore / separates the sets M — N and {0}. But then / 
separates the sets M and N. 1 

Problem 1. Let M be the set of all points x = (x 1; x 2 , . . ., x„,. . .) in 4 
satisfying the condition 

00 

2n*x*< 1. 

n=l 

Prove that M is a convex set, but not a convex body. 

Problem 2. Give an example of two convex bodies whose intersection is 
not a convex body. 

Problem 3. We say that n + 1 points x 1 , x 2 .x n+1 in a linear space L 

are “in general position” if they do not belong to any (n — l)-dimensional 
subspace of L. The convex hull of a set of n + 1 points x t , x 2 ,. . . , x n+1 in 

general position is called an n-dimensional simplex, and the points x 1( x 2 . 

x n+1 themselves are called the vertices of the simplex. Describe the zero- 
dimensional, one-dimensional, two-dimensional and three-dimensional 
simplexes in Euclidean three-space R 3 . Prove that the simplex with vertices 
x x , x 2 ,... , x n+1 is the set of all points in L which can be represented in the 
form 

«+l 

x = 2 ****> 

k =1 

where 

n+l 

Xfc ^ 2 ‘ f ■ 

k=l 

Problem 4. Show that if the points x lt x 2 ,..., x n+1 are in general position, 
then so are any k + 1 (k < n) of them. 

Comment. Hence the k + 1 points generate a /c-dimensional simplex, 
called a k-dimensional face of the n-dimensional simplex with vertices x 1; 
x 2 ,... , X n _)_i. 

Problem 5. Describe all zero-dimensional, one-dimensional and two- 
dimensional faces of the tetrahedron in R 3 with vertices e x , e 2 , e 3 , e 4 . 

Problem 6. Show that in the Hahn-Banach theorem we can drop the 
condition that the functional p be finite. 

15. Normed Linear Spaces 

15.1. Definitions and examples. Chapters 2 and 3 deal with topological 
(in particular, metric) spaces, i.e., spaces equipped with the notion of 
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closeness of elements, while Secs. 14 and 15 deal with linear spaces, i.e., 
spaces equipped with the operations of addition of elements and multipli¬ 
cation of elements by numbers. We now combine these two ideas, arriving at 
the notion of a topological linear space , equipped with a topology as well 
as with the algebraic operations characterizing a linear space. In this section 
and the next, we will study topological linear spaces of a particularly 
important type, namely normed linear spaces and Euclidean spaces. Topo¬ 
logical linear spaces in general will be considered in Sec. 17. 

Definition 1. A functional p defined on a linear space L is said to be 

a norm (in L ) if it has the following properties'. 

a) p is finite and convex ; 

b) p(x) = 0 only if x — 0; 

c) p( ax) = |a| p(x) for all x e L and all a. 

Recalling the definition of a convex functional, we see that a norm in 
L is a finite functional on L such that 

1) p(x) > 0 for all xsL, where p(x) = 0 if and only if x = 0; 

2) p(a.x) — |a| p(x) for all x e L and all a; 

3) p(x + y) < p(x) + p(y) for all x,yeL. 

Definition 2. A linear space L, equipped with a norm p(x) — ||x||, is 

called a normed linear space. 

The notation ||x|| will henceforth be preferred for the norm of the element 
x e L. In terms of this notation, properties 1)—3) take the form: 

T) ||x:|| > 0 for all x e L, where ||x|| = 0 if and only if x = 0; 

2') ||ax|| = |a| ||x|| for all x e L and all a; 

3') Triangle inequality. ||x + j|| < ||x|| + |y|| for all x, j e L. 

Every normed linear space L becomes a metric space if we set 

p(x»T) = l|x — y\\ ( 1 ) 

for arbitrary x, y e L. The fact that (1) is a metric follows at once from 
properties l')-3'). Thus everything said about metric spaces in Chap. 2 
carries over to the case of normed linear spaces. 

Many of the spaces considered in Chap. 2 as examples of metric spaces 
(or in Sec. 13 as examples of linear spaces) can be made into normed linear 
spaces in a natural way, as shown by the following examples (in each case, 
verify that the norm has all the required properties): 

8 One of the pioneer workers in this field was Stefan Banach (1892-1945), author of 
the classic Theorie des Operations Lineaires, reprinted by Chelsea Publishing Co., New 
York (1955). 
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Example 1. The real line R 1 becomes a normed linear space if we set 
||x|| = | xj for every number x e R 1 . 

Example 2. To make real n-space R n into a normed linear space, we set 



for every element x = (x x , x 2 ,. .. , x n ) in R". The formula 
p(x, y) = ||x - y|| = /i(x t - y k f 

V Jc=l 

then defines the same metric in R n as already considered in Example 3, p. 38. 
Example 3. We can also equip real u-space with the norm 

IWIa =2>*l ( 2 ) 

&=1 

or the norm 

Mo = max |x ft |. (3) 

l<k<n 

The corresponding metrics lead to the spaces R” and R” considered in Ex¬ 
amples 4 and 5, p. 39. 

Example 4. The formula _ 

mi = Jim 2 

\ 7c—1 

introduces a norm in complex n-space C n . Other possible norms in C’ ! are 
given by (2) and (3). 

Example 5. The space C [0i6] of all functions continuous on the interval 
[a, b ] can be equipped with the norm 

||/|| = max |/(0I- 

The metric space corresponding to this norm has already been considered in 
Example 6, p. 39. 

Example 6. Let m be the space of all bounded numerical sequences 

X (Xi, X 2 , • • ■ , Xfc, . . .), 

and let 

Ml = sup |x fc |. (4) 

k 

Then (4) obviously has all the properties of a norm. The metric “induced” 
by this norm is the same as that considered in Example 9, p. 41. 














140 LINEAR SPACES 


CHAP. 4 


Example 7. A complete normed linear space, relative to the metric (1), is 
called a Banach space. It is easy to see that the spaces in Examples 1-6 are 
all Banach spaces (the details are left as an exercise). 

15.2. Subspaces of a normed linear space. In Sec. 13.3 we defined a 
subspace of a linear space L (unequipped with any topology) as a nonempty 
set L 0 with the property that if x, y e L 0 , then ax + (3j e L 0 for arbitrary a 
and (3. The subspaces of greatest interest in a normed linear space are the 
closed subspaces, i.e., those containing all their limit points. In the case of an 
infinite-dimensional space, it is easy to give examples of subspaces that are 
not closed: 9 

Example 1 . In the space of all bounded sequences, the sequences with 
only finitely many nonzero terms form a subspace, but not a closed subspace, 
since, for example, the closure of the subspace contains the sequence 



Example 2. The set P [a6] of all polynomials defined on the interval [a, b\ 
is a subspace of C [0i6] , but obviously not a closed subspace. On the other 
hand, the closure of P l<X:b] coincides with C [ai6] , since every function con¬ 
tinuous on [a, b ] is the limit of a uniformly convergent sequence of poly¬ 
nomials, by Weierstrass' approximation theorem , 10 

In what follows, we will be concerned as a rule with closed subspaces. 
Hence it is natural to modify somewhat the terminology adopted in Sec. 13.3, 
i.e., by a subspace of a normed normed linear space we will always mean a 
closed subspace. In particular, by the subspace generated by a set of elements 
{x a } we will always mean the smallest closed subspace containing {x a }. This 
subspace will also be called the linear closure of (xj. The term linear manifold 
will be reserved for a set of elements L 0 (not necessarily closed) such that 
x, y e L g implies ax + (3_y e L 0 for arbitrary numbers a and (3. A set of 
elements {xj in a normed linear space L is said to be complete (in L) if the 
linear closure of {x a } coincides with L. 

Remark. This is another meaning of the word “closed,” not to be confused 
with its meaning in Sec. 6.4. The context will always make it clear which 
meaning is intended. 

Example 3. By Weierstrass’ approximation theorem, the set of functions 
1, t, f 2 ,. . . , t n , . . . is complete in C [0>| . 

9 This contingency cannot arise in a finite-dimensional subspace (see Problem 5a). 

10 See e.g., G. P. Tolstov, Fourier Series (translated by R. A. Silverman), Prentice-Hall 
Inc., Englewood Cliffs, N.J. (1962), p. 120. 
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Problem 1. A subset M of a normed linear space R is said to be bounded 
if there is a constant C such that ||x|| < C for all xe M. Reconcile this with 
Problem 5, p. 65. 

Problem 2. Given a Banach space R, let {B n } be a nested sequence of 
closed spheres in R (so that B ± => B 2 => • • • => B n => • • •). Prove that f) B n 

n 

is nonempty (it is not assumed that the radius of B n approaches 0 as n -* oo). 
Give an example of a nested sequence {E n } of nonempty closed bounded 
convex sets in a Banach space R such that p| E n is empty (cf. Problem 6, 

p. 66). n 

Problem 3. Prove that the algebraic dimension (defined in Problem 4c, 
p. 128) of an infinite-dimensional Banach space is uncountable. 

Problem 4. Let R be a Banach space, and let M be a closed subspace of R. 
Define a norm in the factor space P = RjM by setting 

II5II = inf \\x\\ 

xe% 

for every element (residue class) £ e P. Prove that 

a) || £|| is actually a norm in P; 

b) The space P, equipped with this norm, is a Banach space. 

Problem 5. Let R be a normed linear space. Prove that 

a) Every finite-dimensional linear subspace of R is closed; 

b) If M is a closed subspace of R and N a finite-dimensional subspace 
of R, then the set 

M + N = {z:z = x + y, x e M, y e N} (5) 

is a closed subspace of R; 

c) If Q is an open convex set in R and x„ £ Q, then there exists a closed 
hyperplane which passes through the point x 0 and does not intersect Q. 

Problem 6. Let x = (x lt x 2 ,. . . , x k ,. ..) be an arbitrary element of / 2 . 
Prove that 4 is a normed linear space when equipped with the norm 



Give an example of two closed linear subspaces M and N of 4 whose “linear 
sum” M + N is not closed. 

Problem 7. Two norms |f - || x , |J -1| 2 in a linear space R are said to be 
equivalent if there exist constants a,b > 0 such that 

a 11x14 < ||x|| 2 < b 11x14 

for all xe R. Prove that if R is finite-dimensional, then any two norms in 
R are equivalent. 

















1 42 LINEAR SPACES 


CHAP. 4 


16. Euclidean Spaces 

16.1. Scalar products. Orthogonality and bases. We begin with two key 
definitions: 

Definition 1 . By a scalar product in a real linear space R is meant a 
real function defined for every pair of elements x,y e R and denoted by 
(x,y), with the following properties'. 

1) (x, x) ^ 0 where [x 9 x') — 0 if ciftd only if x — Oj 

2) (x,y) = (y, x); 

3) (Xx,y) = X(x,y); 

4) (x,y +z)= (x,,y) + (x, z) 

(i valid for all x,y, z e R and all real X). 

Definition 2. A linear space R equipped with a scalar product is called 
a Euclidean space. 

Lemma. Any two elements x, y of a Euclidean space R satisfy the 
Schwarz inequality 

\(*,y)\ < Ml lbII, (i) 

where 

Mi = V(*,x), w =vW). 

Proof The quadratic polynomial 

cp(X) = (Xx +y, Xx +y) = X 2 (x, x) + 2X(x,y) + (y,y) 

= ||x|| 2 X 2 +2(x,y)X + llyll 2 
is obviously nonnegative. Therefore 

(x,y) 2 - ||x|| 2 || 7 || 2 < 0, (2) 

since otherwise cp(X) would become negative for some X (why?). But (2) 
is equivalent to (1). 1 

We now use the scalar product in a Euclidean space R to introduce a 
norm in R : 

Theorem 1. A Euclidean space R becomes a normed linear space when 
equipped with the norm 

||x|| = V(x, x) (xeR). 

Proof. Properties 1') and 2') on p. 138 are immediate consequences 
of the definition of a scalar product. To prove property 3'), i.e., the 
triangle inequality, we note that 

\\x +y\ I 2 = (x +y, x + y) = (x, x) + 2 (x,y) + (y,y) 

< (x, x) + 2 \(x,y)\ + (y,y) 

< Ifxii 2 + 2 ||x|| Ml + Ml 2 = (Mi + llyll) 2 , 
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because of the Schwarz inequality (1), and hence 

\\x +y\\ < Ml + Ibll- 1 

The scalar product in R can be used to define the angle between two 
vectors as well as the length (i.e., norm) of a vector: 

Definition 3. Given any two vectors x and y in a Euclidean space R, 
the quantity 0 defined by the formula 

cos 0 = (X ’ y) (0 < 0 < is) (3) 

1*11 lb II 


is called the angle between x and y. 

Remark. It follows from Schwarz’s inequality (1) that the right-hand 
side of (3) cannot exceed 1. Therefore, given any x and y, (3) actually 
determines a unique angle in the interval [0,7r], 

Suppose (x, y) = 0, so that (3) implies 0 == n/2. Then the vectors x and y 
are said to be orthogonal. A set of nonzero vectors {x a } in R is said to be 
an orthogonal system if 

(x a , Xg) = 0 for a (3 


and an orthonormal system if 

(*«, *e) 


for a ^ (3, 
for a = (3. 


If {x a } is an orthogonal system, then clearly 



is an orthonormal system. 

Theorem 2. The vectors in an orthogonal system {x a } are linearly 
independent. 

Proof. Suppose 

CiX ai + c 2 x aa + • • ■ + c„x Kn = 0. 

Then, taking the scalar product with x a ., we get 

(x ai ) C t x xi + C 2 x a2 "T * *' T~ c n x Xn ) = cfx Xi , x x l) 0, 
by the orthogonality of {xj. But (x a ., x a .) # 0, and hence 
Ci = 0 (i=l,2,..., n). 1 

An orthogonal system {xj is called an orthogonal basis if it is complete, 
i.e., if the smallest closed subspace containing {xj is the whole space R. 
Similarly, a complete orthonormal system is called an orthonormal basis. 
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16.2, Examples. We now give some examples of Euclidean spaces and 
orthogonal bases in them: 

Example 1. Let R n be real n-space, i.e., the set of all ordered n-tuples 

x = (x u x 2 ,... , x n ), y = (y u y 2 , . . . ,y n ) .equipped with the same 

algebraic operations as in Example 2, p. 119. Using the formula 

n 

(x,y)^2x k y k (4) 

&=i 

to define a scalar product in R n , we get Euclidean n-space. u The corre¬ 
sponding norm and distance in R n are 



and 

?(x, y) = \\x - y\\ = f,(x k - y k )\ (5) 

V jfc=r 

The vectors 

ei = (l,0,0,...,0), 

e a — (0, 1, 0,. . . , 0), 


e n = (0, 0, 0,... , 1) 

form an orthonormal basis in R n , one of infinitely many such bases. 

Example 2. The space 4 with elements x — (x u x 2 . x k ,...), y = 

Oi> JV ■■■ ,y k , ■■■),■■■ , Where 

24 < °°> f>* < oo,..., 

k=l k=l 

becomes an infinite-dimensional Euclidean space when equipped with the 
scalar product 

00 

(x, y) (6) 

k =1 

The convergence of the right-hand side of ( 6 ) follows from the elementary 
inequality 

\x k y k \ < (|**| + I y k \f < 2(4 4- y \), 

and it is an easy matter to verify that ( 6 ) has all the properties of a scalar 

11 The term “Euclidean n-space” has already been used in Example 3, p. 38 to describe 
the metric space with distance (5). In so doing, we anticipated the eventual introduction of 
the scalar product (4). 
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product. The simplest orthonormal basis in 4 consists of the vectors 

«i = (1,0,0,...), 

e 2 = (0,1,0,. ..), 

e 3 = ( 0 , 0 , 1 ,. . .), 


The orthonormality of the system (7) is obvious. As for the completeness 
of the system, given any vector x = (x,, x 2 ,.. . , x k ,. . .) in 4 , let 

x m = (*!, x 2 ,, x k , 0, 0, . . .). 

Then x m is a linear combination of the vectors e u e 2 , . . . , e k and 
||jc (fe> — x || -*■ 0 as k oo. 

Example 3. The space Cf a 6] consisting of all continuous functions on 
[a, b] equipped with the scalar product 

if, g ) =J>M0 dt 


is another example of a Euclidean space. Among the various orthogonal 
bases in Cf u b] , one of the most important is the system of trigonometric 
functions 


1 , cos 


(n = 1 , 2 ,. ..). 


The orthogonality of this system can be verified by a simple calculation. 
Making the choice a = — 7 t, b — n, we simplify ( 8 ) to 


1 , cos nt, sin nt (n — 1 , 2 ,...). ( 8 ') 


Thus ( 8 ') is an orthogonal basis in the space C[ 2 _ n n] . As for the completeness, 
we have 

Theorem 3. The system (8) is complete in C£ a i y 

Proof. By another version of Weierstrass’ approximation theorem , 12 
every function 9 continuous on the interval [a, b] and such that 9 (a) = 

9 (b) is the limit of a uniformly convergent sequence of trigonometric 
polynomials, i.e., linear combinations of elements of the system ( 8 ). 
This sequence converges (a fortiori) to 9 in the norm of the space 
But an arbitrary function/ e C 2 a>!)] can be represented as the limit in the 

12 See e.g., G. P. Tolstov, op. cit.. Corollary 1, p. 117. 
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Cf a 6 j norm of a sequence of functions {y n }, where 


<Pn(x) = 


if a < x < b 


nf(b — — nf (a) (b — x) + f(a) if b — - < x 

\ n) J n 



Figure 16 


coincides with / in the interval 
[a, b — (1 /n)], is linear on [b — (1 /«), b] 
and takes the same value at the point 
b as at the point a (see Figure 16). 
Hence every element of C 2 a bJ can 
be approximated arbitrarily closely 
(in the C[ 2 a 6] norm) by a linear 
combination of elements of the system 
( 8 ). 1 


16.3. Existence of an orthogonal basis. Orthogonaiization. From now on, 
we will be mainly concerned with the case of separable Euclidean spaces, 
i.e., Euclidean spaces containing a countable everywhere dense subset. For 
example, the spaces R n , 4 and C 2 o!)] are all separable, as shown in Sec. 6.3. 
An example of a nonseparable Euclidean space is given in Problem 2. 

Theorem 4. Every orthogonal system {xj in a separable Euclidean 
space R has no more than countably many elements x a . 

Proof. There is no loss of generality in assuming that the system 
{xf is orthonormal as well as orthogonal, since otherwise we need only 
replace {x a } by 


We then have 

ll*« — x&ll = V2 if a# (3. (9) 

Consider the set of open spheres S(x x , J). These spheres are pairwise 
disjoint, because of (9). Moreover, each sphere contains at least one 
element from some countable subset {y n } everywhere dense in R. Conse¬ 
quently there are no more than countably many such spheres, and hence 
no more than countably many elements x a . | 

We have already exhibited an orthogonal basis in each of the spaces R n , 
4 and Cj - 2 0 jj. The existence of an orthogonal basis in any separable Euclidean 
space is guaranteed by the following theorem and its corollary, analogous 
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to the theorem on the existence of an orthogonal basis in any finite-dimensional 
Euclidean space: 13 

Theorem 5 (Orthogonaiization theorem ). Let 

fuU (io) 

be any (countable) set of linearly independent elements of a Euclidean 
space R. Then R contains a set of elements 

<Pi, ?a> • • • . <P*> • • • (11) 

such that 

1.) The system (11) is orthonormal', 

2) Every element o n is a linear combination 

9 n ~ a ri 2 f‘i a nn f n (a nn f: 0) 

of the elements f,f, 

3) Every element f n is a linear combination 

fn = £>nl<Pl + b m9z H-+ b nn<?n ( b nn # 0) 

of the elements tf 1} cp 2 .cp„. 

■Moreover, every element of the system (10) is uniquely determined by these 
conditions to within a factor of ± 1. 

Proof First we construct tp x . Setting 

Vi = a n/i> 

we determine a n from the condition 

(9i> 9i) = «ii(/i./i) == 1> 

which implies 

1 1 

This obviously determines qj x uniquely (except for sign). 

Next suppose elements (p 1( <p 2 , • • • > 9n-i satisfying the conditions of 
the theorem have already been constructed. Then f n can be written in the 
form 

fn = b nl<?l + ' ' ' + b n,n-l9n-l + b n , (12) 

where 

(h n ,<f k ) = 0 (k = 1,2,...,«— 1). 

13 See e.g., G. E. Shilov, An Introduction to the Theory of Linear Spaces (translated by 
R. A. Silverman), Prentice-Hall, Inc., Englewood Cliffs. N.J. (1961), Theorem 28, p. 142. 
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In fact, the coefficients b nk and hence the element h n are uniquely 
determined by the conditions 

(A> <? k ) = (A — Ai9i — -i<p«-i» ?*) 

= (A. 9^) - A*(<p*> ?*) = o, 

i.e., 

b nk = (A. <Pfc) (k = 1, 2 ,... , n - 1). 

Clearly (/i„, h n ) > 0, since (A> /?„) = 0 contradicts the assumed linear 
independence of the elements (10). Let 

9, = ~J=== (13) 

v (As bn) 

Using (12) and (13), we express h n and hence cp„ in terms of the functions 

fuft .A>i- e -> 5 

9n @nlfl 4” Qnzfz A 4* ®nnfni 


Moreover 


u nn j - / 

V(A> h n) 

(?„> 9k) = 0 (k = 1,2, 1), 

(<P»> = 1 

A “ Al<Pl + A 292 + • • ' + b nn (p n , 
bnn (^n? A) '- > H. 


Thus, starting from elements <p l5 ep 2 ,... , cp„_ x satisfying the conditions 
of the theorem, we have constructed elements <p 1; cp 2 ,. . . , <p B _ l5 cp n 
satisfying the same conditions. The proof now follows by mathematical 
induction, g 


Remark. The process leading from the linearly independent elements (10) 
to the orthonormal system (11) is called orthogonalization. It is clear that 
the subspace generated by ( 10 ) coincides with that generated by ( 11 ). 
Hence the set (10) is complete if and only if the set (11) is complete. 

Corollary. Every separable Euclidean space R has a countable 
orthonormal basis. 


Proof. Let A> • • • > • • • be a countable everywhere dense 

subset of R. Then a complete set of linearly independent elements / 1; 
fz, • • • ,/„, ■ ■ ■ can be selected from {/,,}• In fact, we need only eliminate 
from the sequence {/„} all elements A which can be written as linear 
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combinations of elements A with smaller indices (i < k). Applying the 
orthogonalization process to ■ ■ ■ ,A> • • • > we g et an orthonormal 
basis. J 


16.4. Bessel’s inequality. Closed orthogonal systems. Let e u e 2 ,. . ., e„ 
be an orthonormal basis in R n . Then every vector x e R n can be written in 
the form 

n 

x = 2 c k e k , 

1 

where 

C* = (x, e k ). 

We now show how this generalizes to the case of an infinite-dimensional 
Euclidean space R. Let cp l5 <p a ,.. . , <p*,. . . be an orthonormal system in 
R, and let/be an arbitrary element of R. Suppose that with/ we associate 

1) The sequence of numbers 

c k = (/, 9k) (* = 1,2,...), (14) 

called the components or Fourier coefficients of / with respect to the 
system {?*}; 

2) The series 

lc k 9k (I 5 ) 

k =1 

(for the time being, purely formal), called the Fourier series of/with 
respect to the system {cp fc }. 

Then it is natural to ask whether the series (15) converges , 14 and if so, 
whether the sum of the series coincides with the original function / To 
answer these questions, we first prove 

Theorem 6. Given an orthonormal system 

<Pl» ?2,•••>?&>••• ( 16 ) 

in a Euclidean space R, let f be an arbitrary element of R. Then the 
expression 

n 

f- 1 a k<?k 

k =1 

achieves its minimum for 


a k — c k — (A 9k) A 1,2,...,«). 


14 More exactly, whether the sequence of partial sums corresponding to (15) converges 
in the metric of R. 
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This minimum equals 

n 

i/B* -24 

Moreover 

00 

24<\\f\\\ (17) 

k=l 

a result known as Bessel’s inequality. 

Proof. Let 

n 

S n=2 a k?k- (18) 

fc=l 


Then, by the orthonormality of (16), 


II/- 

' S„|| — // 2 a k ( ?k>f~ 2^9*) 

\ 7c=l Jfc-1 / 

1 


= (/»/) — 2 ^/^2 a » < Pjtj + | 

9* 2 a i9i) 

1 ;=i / 


= ii/f - 2 2a k c k +24 


or 



ii/- 

Still 2 = ll/ll 2 ~24 J r 2( a /c — c k ) 

i 2 , 

where 

k—1 k~1 



c k =(/.¥*) Qc = 1,2,.. 

• ,«)■ 


The expression in the right-hand side of (19) obviously achieves its mini¬ 
mum when its last term vanishes, i.e., when 

a k — c k (k — 1 , 2 ,... , «), 

and this minimum is just 

II/- SX= ll/ll 2 -14. (20) 

k=l 

Moreover, since \\f - SJ 2 > 0, it follows from (20) that 

i^ 2 < ii/f (2D 

Jc—l 

for every n. Hence the series 

co 

is convergent. Taking the limit as n -> oo in (21), we get (17). g 
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Remark. Geometrically, Bessel’s inequality (17) means that the sum of 
the squares of the projections of a vector/onto a set of mutually perpendicular 
directions cannot exceed the square of the length of the vector itself. For a 
geometric interpretation of the rest of Theorem 6 , see Problems 5 and 6 . 

The case where Bessel’s inequality becomes an equality is particularly 
important: 

Definition 4. Suppose equality holds in (17) for every f e R, i.e., 
suppose 

co 

24 = ll/ll 2 (22) 

for every f e R. Then the orthonormal system cp 1; <p 8 , . .. , <p„,. . . is said 
fo be closed. 

Remark. This is another meaning of the word “closed,” not to be 
confused with its meaning in Sec. 6.4. The context will always make it 
clear which meaning is intended. Formula (22) is known as ParsevaVs 
theorem. 

Theorem 7. An orthonormal system <Pi <p 2 > •••>%>•• • w* a Euclidean 
space R is closed if and only if every element f e Ris the sum of its Fourier 
series. 

Proof. According to Definition 4, R is closed if and only if (22) holds 
for every/e R. Taking the limit as n -*■ co in (20) and using (18), we see 
that ( 22 ) holds for every / £ A if and only if 

n 

lim f-2 c k<?k = 0 . 

71 ~* GO fc=l 

or equivalently 

co 

/= 2 C fc?ft> 

k=l 

for every feR. 1 

The properties of being complete and being closed are intimately connected, 
as shown by 

Theorem 8. An orthonormal system cp 1; <p 2 ,. in a Euclidean 
space R is complete if and only if it is closed. 

Proof. Suppose {cpj.} is closed. Then, by Theorem 7, every element 
feR is the limit of the partial sums of its Fourier series. In other words, 
linear combinations of elements of {<p ft } are everywhere dense in R, 
i.e., {cp k } is complete. 
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Conversely, suppose {<p*} is complete. Then every element/e R can 
be approximated arbitrarily closely by a linear combination 

n 

7c =1 

of elements of {cpj. But the partial sum 

n 

9k 

7c=1 

of the Fourier series of/is at least as good an approximation. Hence/ 
is the sum of its own Fourier series. It follows from Theorem 7 that 
{cp*.} is closed. 1 

Corollary. Every separable Euclidean space R contains a closed 
orthonormal system <Pi, <p 2 , • • • . 9 *, •.. 

Proof. An immediate consequence of Theorem 8 and the corollary 
to Theorem 5. [ 

Example 1. The orthonormal system (7) is closed in / 2 . 

Remark. In introducing the concepts of Fourier coefficients and Fourier 
series, we assumed that the system {<pjJ is orthonormal. More generally, 
suppose { 9 *} is orthogonal but not orthonormal, and let 


Then the system is orthonormal. Given any f e R, let 


and consider the series 


C k (ft 4^) — (/, 9 fc)> 

II 9*11 


fc-l k= 1 || tpfcll fc=l 


a k = — ^ . (23) 

II 9,11 IMI 2 { } 

Then the coefficients (23) are called the Fourier coefficients of the element 
feR with respect to the orthogonal (but not orthonormal) system { 9 + 
Substituting c k = a k || <p*|| into (17), we get the following version of Bessel’s 
inequality for arbitrary orthogonal systems: 

00 

14 ii<pJ 2 < ii/ii 2 . 

7c =1 


(17') 
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If equality holds in (17') for every feR, the orthogonal system {q> fc } is said 
to be closed, just as in Definition 4. 

Example 2. The orthogonal system ( 8 ) is closed in C? 6 ,. 

16.5. Complete Euclidean spaces. The Riesz-Fischer theorem. Given a 
Euclidean space R, let {tp fc } be an orthonormal (but not necessarily complete) 
system in R. It follows from Bessel’s inequality that a necessary condition 
for the numbers c 1; c 2 ,... , c k ,. .. to be Fourier coefficients of an element 
feR is that the series 


converge. It turns out that this condition is also sufficient if R is complete, 
as shown by 

Theorem 9 ( Riesz-Fischer). Given an orthonormal system {cp*.} in a 
complete Euclidean space R, let the numbers c 1 , c 2 ,... , c k ,. . . be such 
that 

1 cl (24) 

converges. Then there exists an element feR with c x , c 2 ,... , c k ,... as 
its Pourier coefficients, i.e., such that 

14 = n/ir 

k=l 

where 

c k = (f><?k) (k= 1 , 2 ,...). 

Proof. Writing 

n 

fn ~ ^ 

k =1 

we have 

n+x> 

liyri+j) All = ll^n+lTw+1 ~l~ ^n+j) ( ?n+‘p II 2 

ft—71+1 

Hence / converges to some element feR, by the convergence of (24) 
and the completeness of R. Moreover, 

(/, 9k) = (/». 9*) +(/-/«» 9*)> ( 25 ) 

where the first term on the right equals c k if n > k and the second term 
approaches zero as n —oo, since 


!(/-/„ ?*)!< 11/-All II<p*II- 
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Taking the limit as n co in (25), we get 

(/. 9s) - Cft. 

since the left-hand side is independent of n. Moreover, 

II/-/J-0 

as n -*■ oo, and hence 

( n n \ n 

/-I^9ft./-2 c ft9ft) = (/>/) 

lc=l ft=1 / ft=l 

as n -> co, i.e., 

lim = 2 c ft = ll/ll 2 - 8 

w-*oo fc=l fc=l 

Theorem 10. Let {q k } be an orthonormal system in a complete Eu¬ 
clidean space. Then {cp fe } is complete if and only if R contains no nonzero 
element orthogonal to all the elements of {cp fc }. 

Proof. Suppose {<p/ is complete and hence closed (by Theorem 8 ), 
and suppose / is orthogonal to all the elements of (cpj. Then all the 
Fourier coefficients of/ vanish. Hence 

ll/ll 2 =I4 = o 

ft=i 

by the Riesz-Fischer theorem, i.e.,/ = 0. 

Conversely, suppose {cpj is not complete. Then R contains an 
element g # 0 such that 

II g II 12 > 2 cl, where c k = (g, <p fc ) 

ft*> i 

(why?). By the Riesz-Fischer theorem, there exists an element /eR 
such that 

00 

(/> 9ft) = c ft> ll/ll 2 =2>ft- 

fc=i 

But / — g is orthogonal to all the <p s , by construction. Moreover, it 
follows from 

00 

il/ll 2 =lfi< II gll 2 

that/ — g 0. I 

16.6. Hilbert space. The isomorphism theorem. Continuing our study of 
complete Euclidean spaces, we concentrate our attention on infinite¬ 
dimensional spaces, since finite-dimensional spaces are considered in great 
detail in courses on linear algebra. 


Definition 5 . By a Hilbert space 16 is meant a Euclidean space which 
is complete, separable and infinite-dimensional. 

In other words, a Hilbert space is a set H of elements f,g,... of any 
kind such that 

1) H is a Euclidean space, i.e., a real linear space 16 equipped with a 
scalar product; 

2) H is complete with respect to the metric p(/, g) = \\f — g||; 

3) H is separable, i.e., H contains a countable everywhere dense subset; 

4) H is infinite-dimensional, i.e., given any positive integer n, H contains 
n linearly independent elements. 

Example. The real space 4 is a Hilbert space (check all the properties). 

Definition 6. Two Euclidean spaces R and R* are said to be isomor¬ 
phic (to each other) if there is a one-to-one correspondence x<—> x*,y<-^ y* 
between the elements of R and those of R* (x, y e R, x*, y* e R*) 
preserving linear operations and scalar products in the sense that 17 

x +_y<->x* +y*, <x.x<-> ax*, (x, y) — (x*,y*). 

It is well known that any two n-dimensional Euclidean spaces are iso¬ 
morphic to each other, and in particular that every H-dimensional Euclidean 
space is isomorphic to the space R" of Example 1, p. 144. 18 On the other 
hand, two infinite-dimensional Euclidean spaces need not be isomorphic. 
For example, the spaces / 3 and Cf a _ b] are not isomorphic, as can be seen from 
the fact that 4 is complete while Cf a b] is not (recall Examples 4 and 5, 
p. 57). Nevertheless, for Hilbert spaces we have 

Theorem 11 (Isomorphism theorem). Any two Hilbert spaces are 
isomorphic. 

Proof. The theorem will be proved once we manage to show that 
every Hilbert space H is isomorphic to 4- Let {cp,.} be any complete 
orthonormal system in H (such exists by the corollary to Theorem 5), 
and with every element / e H associate its Fourier coefficients {c k } with 
respect to {cp*}. Since 

CO 

2 4 < °°, 

ft=a 

15 Named after the celebrated German mathematician David Hilbert (1862-1943). 

16 However, see Sec. 16.9. 

17 Isomorphism of two normed linear spaces R and R* is defined in the same way, 
except that preservation of scalar products is replaced by preservation of norms, i.e., by 
the condition ||jt|| = ||x*||. 

18 See e.g., G. E. Shilov, op. cit.. Theorem 29, p. 144. 
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by Theorem 8, the sequence (c x , c 2 ,.. . , c k ,. ..) belongs to 4- Con¬ 
versely, by the Riesz-Fischer theorem, to every element (c lt c 2 , . . . , 
c k ,. ..) in 4 there corresponds an element /e H with the numbers c lt 

.. c k ,.. . as its Fourier coefficients. This correspondence between 

the elements of H and those of 4 is obviously one-to-one. Moreover, if 

/<-> ( c i> c 2 ,... , c k ,...), 

J* ^ , Cfc, . . .), 

then clearly 

/ +/<-* (<h + Cj, c 2 + c 2 ,... , c k + c k ,. . .), 
a/V> (ac l5 ac 2 , . . . , a c k , . . .), 

i.e., sums go into sums and scalar multiples into scalar multiples with the 
same factor. Finally, by Parseval’s theorem, 

(/>/) = 24 (/,/) = 24 

fc«=l k=l 

(/,/) + 2 (/,/) + (/,/) = (/ + /,/ + /) = f(c* + 4) 2 

7c=l 

= 2^ + 22^4 + 24 

fc=i fc=l fc=l 

and hence 

(/./)= 2 C *4> 

Tc=l 

so that scalar products are preserved. 1 

Remark. Theorem 11 shows that to within an isomorphism, there is 
only one Hilbert space (i.e., only one space with the four properties listed 
above, and that this space has 4 as its “coordinate realization,” just as 
the space of all ordered w-tuples of real numbers with the scalar product 

n 

2 x k y k is the “coordinate realization” of axiomatically defined Euclidean 

fc=i 

n-space. 

16.7. Subspaces. Orthogonal complements and direct sums. In keeping 
with the terminology of Sec. 15.2, by a linear manifold in a Hilbert space H 
we mean a set L of elements of H such that f,g e H implies of + (3g e L for 
arbitrary numbers a and p, while by a subspace of H we mean a closed linear 
manifold in H. 

Lemma. If a metric space R has a countable everywhere dense subset, 
then so does every Subset R' <= R. 
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Proof. Let 

4 , 4 ■ ■ •, 5 „, • ■ • 

be a countable everywhere dense subset of R, and let 

a n = inf p(£ M ,7)). 

>leR' 

Then, given any positive integers n and p, there is a point r inp e R’ such 
that 

P "hnp) ri a n — . 

p 

Given any s > 0 and any v] e R’, let 



and choose n such that 

p(+n> -']) < | • 

Then 



and hence p(v), 7) nj) ) < s. In other words, R' has an everywhere dense 
subset (n,p = 1,2 ,...) containing no more than countably many 
elements. 1 

Theorem 12. Every subspace M of a Hilbert space H is either a ( com¬ 
plete separable ) Euclidean space or itself a Hilbert space. Moreover, M 
has an orthonormal basis, like H itself. 

Proof. The fact that M has properties 1) and 2) of Definition 5 is 
obvious. The separability of M follows from the lemma. To construct an 
orthonormal basis in M, apply Theorem 5 to any countable everywhere 
dense subset of M. | 

Subspaces of a Hilbert space H have certain special properties (not shared 
by subspaces of an arbitrary normed linear space), stemming from the 
presence of a scalar product in H and the associated concept of orthogonality: 

Theorem 13. Let M be a subspace of a Hilbert space H, and let 
M‘’ = H © M 

denote the orthogonal complement of M, i.e., the set of all elements h' e H 
orthogonal to every he M. Then M' is also a subspace of H. 
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Proof. The linearity of M' is obvious, since 
, {h[, h) = (ftj, h) = 0 

implies 

(oq/ii + a 2 li2, h) = 0 

for arbitrary numbers a! and a 2 . To show that M' is closed, suppose 
{h'fj is a sequence of elements of M' converting to ti. Then, given any 
h g M, 

{,h ', h ) = lim (h’ n , h ) = 0 , 

n~* oo 

and hence h' e Af'. J 

Remark. By definition, h' e M' if and only if h' is orthogonal to every 
he M. But then h e H if and only if h is orthogonal to every h' e M'. Hence 
AT — H © M implies M = H © M' , and we can call M and M' ( mutually ) 
orthogonal subspaces of H. 

Theorem 14. Let M be a subspace of a Hilbert space H, and let 
M' = H © M be the orthogonal complement of M. Then every element 
f e H has a unique representation of the form 

f=h+h', (26) 

where h e M, h' e M'. 

Proof. Given any f e H, let {cp*} be an orthonormal basis in M, and 
let 


fc=2 c *9&> c, £ =(/,%). 

ft -1 


By Bessel’s inequality, 


14 < 00 , 


and hence, by the Riesz-Fischer theorem, h exists and belongs to M. 
Let 

h’=f-h. 

Then obviously 

{h\ 9 .) = 0 

for all k, and since any element g e M can be represented in the form 


we have 




(h\ g) = 1 a k (h’, <p lc ) = 0 , 


i.e., /?' e M’. This proves the existence of the representation (26). 
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To prove the uniqueness of (26), suppose there is another represen¬ 
tation 

/= h 1 + h[, 

where h x e M, h[e M’. Then 

(K 9*) = (/> 9*) = c k 

for all k, and hence 

K = h, h[ = h’. I 

Corollary 1. Every orthonormal system {in a Hilbert space H 
can be enlarged to give a complete orthonormal system in H. 

Proof. Let M be the linear closure of {cpj, so that {of is complete 
in M. Let M’ = H Q M be the orthogonal complement of M, and let 
{f k { be a complete orthonormal system in M’ (such exists by Theorem 12, 
since M’ is a subspace). Recalling (26), we see that the union of {tp fc } 
and {<p k } is a complete orthonormal system in H. | 

Corollary 2. Let M be a subspace of a Hilbert space H, and let 
M’ = H © M. Then M' has codimension n if M has dimension n and 
dimension n if M has codimension n. 

Proof. An immediate consequence of the representation (26) and 
Theorem 2, p. 122. | 


Let M be a subspace of a Hilbert space H, with orthogonal complement 
M' = H © M. If every vector fe H can be represented in the form 

/ = h + W {he M,h' e M'), 

we say that H is the direct sum of the orthogonal subspaces M and M', and 
write 

H = M © M’. 

The concept of a direct sum generalizes at once to the case of any finite or 
even countable number of subspaces: Thus H is said to be the direct sum 
of the subspaces M x , M 2 ,... , M n , . . . and we write 


if 


H — M x © M 2 © • • * © M n © • • * 

1) The subspaces M{ are pairwise orthogonal, i.e., every element in 
is orthogonal to every element in M k whenever j ^ k; 

2) Every element / e H has a representation of the form 


f=h x +h 2 -\ - +h n + ■■■ 


where h n e H n {n = 1, 2, . ..). 


(27) 
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It is easy to see that the representation (27) is unique if it exists and that 

CO 

il/ll 2 = 2 IIM 2 

n=l 

(give the details). 

Besides direct sums of subspaces, we can also talk about direct sums of a 
finite or countable number of Hilbert spaces. Thus, given two Hilbert spaces 
H 1 and //.,, by the direct sum 

H = Hi ® H 2 

is meant the set of all ordered pairs (h lt A 2 ) with h x e H x , h 2 e H 2 , where 
linear operations and the scalar product in H are defined by 

(hi, h 2 ) + (A(, A 2 ) = (h 1 + h[, h 2 + A 2 ), 
a (h lt h 2 ) — (ahj, ah 2 ), 

((hi, h 2 ),(h[, h' 2 )) = (h u h[) + (A 2 , A 2 ). 

Consider the subspace of H consisting of all pairs of the form (h x , 0) and 
the subspace consisting of all pairs of the form (0, A 2 ). Then clearly these 
two subspaces are orthogonal and can be identified in a natural way with H x 
and H 2 , respectively. More generally, given any Hilbert spaces H x , H 2 , ... , 
H n , . .. , by the direct sum 

H — H x © H 2 © * * * © H n © * * • 

is meant the set of all sequences 

A = (h x , A 2 , ... , A„,...) (h n e H n ) 

such that 

i WKf < 00, 

n =1 

with linear operations defined in the obvious way and the scalar product of 
two elements h = (h x , h 2 , ... , h n , ...), g = (g u g 2 ,... , g n ,.. .) defined by 

00 

( h, g) = 2(h„, gj. 

n —1 

16.8. Characterization of Euclidean spaces. Given a normed linear space 
R, we now look for circumstances under which R is Euclidean. In other 
words, we look for extra conditions on the norm of R which guarantee that 
the norm be derivable from some suitably defined scalar product in R. 

Theorem 15. A necessary and sufficient condition for a normed linear 
space R to be Euclidean is that 

ll/ + gll 2 + II/-gll 2 = 2(||/||s + ||gP) 

for every f,geR. 


(28) 
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Proof. Thinking of f + g and f— gas the “diagonals of the parallelo¬ 
gram in R with sides/and g” we can interpret (28) as the analogue of a 
familar property of parallelograms in the plane, i.e., the sum of the 
squares of the diagonals of a parallelogram equals the sum of the 
squares of its sides. The necessity of (28) is obvious, since if R is 
Euclidean, then 

!!/ + £li 2 + II/- sll 2 = (/ + g,f + g) + (f- g,f- g ) 

= (fj)+2(f,g)+(g,g) + (f,f) 

- 2 (/, g) + (g, g) 

= 2 (||/P + \\g\n 

To prove the sufficiency of (28), we set 

(f,g) = m + gV- II/-SII 2 ). (29) 

and show that if (28) holds, then (29) has all the properties of a scalar 
product listed on p. 142. Since (29) implies 

iff) = KII2/II 2 - II/-/II 2 ) = ll/ll 2 , (30) 

the scalar product (29) clearly generates the given norm || • |] in R. More¬ 
over, it follows at once from (29) and (30) that 

1 ) (/»/) > 0 where (/, /) = 0 if and only if/ = 0 ; 

2) (f,g) = (g,f). 

The proof of the linearity properties 

(f + g,h) = (f,h) + (g,h) (31) 

and 

(«/» g) = *(/ g) (32) 

requires a little work. To prove (31), consider the function of three 
vectors 

$(/, g, h) = 4[(/ + g,h)~ (/, h) - (g, h)], 
or equivalently 

*(f,g,h) = n/+g+Aii 2 - \\f-g-hr- \\f + hr + \\f-h\\* 

- Ilg + /*ll 2 + llg-/*ll 2 (33) 

after using (29). It follows from (28) that 

\\f + g+hV = 2\\f±hV+2\\gr- \\f±h-gV. (34) 

Substituting (34) into (33), we get 

Hf, g,h) = -\\f+h- gV + 11 f-h- g|[ 2 + ||/ + A|| 2 
- II/— A|| 2 — ||g + A|| 2 + ||g — A|| 2 . 


(35) 
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Taking half the sum (34) and (35), we find that 

<!>(/,g, h) = Mg + h + /II 2 + 11 g+h- /II 2 ) 
-mg-h+fF + u-h-m 
-\\g - hr + \\ g - h\w 

which becomes 

<t>(f,g, h) = (||g + h\\* + ll/ll 2 ) - (Ilf - All 2 - ll/ll 2 ) 

Hlf+A|| 2 + ||g-A|| 2 =0 

after applying (28) to both expressions in parentheses. But <!>(/, g, h) = 0 
is equivalent to (31). 

To prove (32), we introduce the function 

9 (c) = (c/, g) - c(f, g), 

where/and g are fixed but arbitrary elements of R. It follows at once 
from (29) that 

9(0) = i(llf II 2 - Ilf II 2 ) = 0 

and 9 (-l) = 0, since (-/, f) = -(/, f). Hence, given any integer n, 

(nf, g ) = (sgn «(/ H-+/), f) = sgn «[(/, f) H- + (/, f)l 

= W sgn«(/,f) = n(f, g), 

i.e., 9 (h) = 0. Moreover, given any integers p,q (q + 0), 

i.e., 9 (c) = 0 for all rational c. But 9 (c) is a continuous function of c 
(why?), and hence 9 (c) = 0, which is equivalent to (32). 1 

Example 1. The n-dimensional space R%, equipped with the norm 



is a normed linear space if p > 1 (see Example 10, p. 41) and a Euclidean 
space if p = 2 (see Example 1, p. 144). However, R n v fails to be Euclidean 
if p + 2. In fact, for the two vectors 

/= ( 1 , 1 , 0 , ..., 0 ), 

g=(l,-l,0,...,0), 

we have 

/ + f =(2,0,0,..., 0), 

/-f =(0,2,0,..., 0), 
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and hence 

ll/ll* = Ilf II, = 2V*, |/ + g|| = ||/-g|| =2. 

Therefore the “parallelogram condition” (28) fails if/i + 2. 

Example 2. Consider the space C [0 Tt/2] of all functions continuous on the 
interval [ 0 , 7 t/ 2 ], and let 

f(t) = cos t, g(t) = sin t. 

Then 

11/II = Ilf II = 1, 

and 

11 /+ fll = max |cos t + sin t\ = J2, 

II/— g|| = max [cos t — sin t\ — 1 . 

0<«rc/2 

Therefore 

11/+ fll 2 + II/— fll 2 + 2(||/|| 2 + ||g|| 2 ). 

It follows that the norm in C[ 0jJt/2] cannot be generated by any scalar product 
whatsoever, i.e., the space C [0 n/2l fails to be Euclidean. It is easy to see that 
the same is true of the space C [atl] for any a and b (a < b). 

16.9. Complex Euclidean spaces. Besides real Euclidean spaces, we can 
also consider complex Euclidean spaces, i.e., complex linear spaces equipped 
with a scalar product. However, we must now modify the properties of the 
scalar product listed on p. 142, since in the complex case these properties 
are contradictory as they stand. In fact, it follows from properties 2) and 
3), p. 142 that 

(>a% ax) — a 2 (x, x), 
and hence, after choosing X = /, that 

(ix, ix) — — (x, x), 

i.e., the norms of the vectors x and ix cannot both be positive, contrary to 
property 1). To remedy this difficulty, we define the scalar product in a 
complex Euclidean space R as a complex-valued function (x, _>’), defined for 
every pair of elements x,y e R, with the following properties: 

T) (x, x) > 0 where (x, x) = 0 if and only if x = 0 ; 

2 ') (x,y) = 0 , x); 

3') (to,/) = X(x,_y); 

4') (x, y + z) = (x, z) + (y, z) 

(valid for all x,y, ze R and all complex X). It follows from 2') and 3') that 
(x, X_y) = (Xj>, x) = X(y, x) = X(x, y) 

(as usual, the overbar denotes the complex conjugate). 
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Example 1. The space C n introduced in Example 2, p. 119 becomes a 
complex Euclidean space if we define the scalar product of two elements 
x = (*i,. . . , x n ), y = (y t , ... , y n ) in C n as 

n 

(x,y) = 2 *dv 

j;=i 

Example 2. The complex space 4 with elements x = (x 1 , x 2 ,..., x k ,. ..), 

y = 0w 2 , • • • ,y*, •••)»•••»where 
00 00 

2U*I 2 < °°> 21 ftl 2 

fc=i fc=i 

becomes an infinite-dimensional complex Euclidean space when equipped 
with the scalar product 

00 

(X, y)=2x k y k . 

k -1 

Example 3. The space Cf a 6] of all complex-valued functions continuous 
on the interval [a,b], equipped with the scalar product 

(/> g) =£/(0g(0 dt, 

is another example of an infinite-dimensional complex Euclidean space. 

The norm (length) of a vector in a complex Euclidean space is defined 
by the same formula _ 

lull = V(x, x) 

as in the real case. However, the notion of the angle between two vectors 
x and y plays no role in the complex case, since the quantity 

(x,y) 

IUII iuii 

is in general complex and hence cannot be the cosine of a real angle. On 
the other hand, the notion of orthogonality is defined in the same way as 
before, i.e., two elements x and y of a complex Euclidean space are said 
to be orthogonal if (x, y) = 0. 

Let {o,.} be any orthogonal system in a complex Euclidean space R, and 
let/be any element of R. Then, just as in the real case, the numbers 

c ‘k = “/(/, ft) 

1ft II 
00 

2«*ft 

k=l 


and the series 
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are called the Fourier coefficients and the Fourier series of the function /, 
with respect to the system {cp*}- In the complex case, Bessel’s inequality 
(17') becomes 

00 

21 ii ft ii 2 < ii/f. 

7c=1 

If the system {cpj is orthonormal, the Fourier coefficients become 

— c k — (/> ft)> 

and Bessel’s inequality simplifies to 

2l c *l 2 < ll/ll 2 - 

k=l 

By a complex Flilbert space is meant a complex Euclidean space which is 
complete, separable and infinite-dimensional. Theorem 11 carries over at 
once to the complex case, with isomorphism being defined exactly as in 
Definition 6 : 

Theorem 11' ( Isomorphism theorem). Any two complex Hilbert spaces 
are isomorphic. 

Proof. This time show that every complex Hilbert space is isomorphic 
to the complex space 4 , the “coordinate realization” of a complex 
Hilbert space. J 

Remark. As an exercise, the reader should state and prove the complex 
analogues of all the other theorems of Sec. 16. 

Problem 1. Prove that in a Euclidean space, the operations of addition, 
multiplication by numbers and the formation of scalar products are all 
continuous. More exactly, prove that if x„ x, y n -*y (in the sense of 
norm convergence) and X„ X (in the sense of ordinary convergence), then 

*„ + y n - x + y, X„x„ - Xx, (x n , y„) - ( x , y). 

Hint. Use Schwarz’s inequality. 

Problem 2. Let R be the set of all functions / defined on the interval [0,1] 
such that 

1 ) fit) is nonzero at no more than countably many points 4 , 4 ,.. . ; 

2 ) im < 00 . 

i=i 

Define addition of elements and multiplication of elements by scalars in the 
ordinary way, i.e., (/ + g)(t) =/(0 f g{t), (a/)(0 = <*/(/). If/and g are 
two elements of R, nonzero only at the points 4 , 4> • • • an d /, t 2 ,... , 

















166 LINE\R SPACES 


TOPOLOGICAL LINEAR SPACES 167 


CHAP. 4 

respectively, define the scalar product of/and g as 

oo 

(/, g) = 2 

i,j —1 

Prove that this scalar product makes R into a Euclidean space. Prove that R 
is nonseparable, i.e., that R contains no countable everywhere dense subset. 

Problem 3. Give an example of a (nonseparable) Euclidean space which 
has no orthonormal basis. Prove that a complete Euclidean space (not 
necessarily separable) always has an orthonormal basis. 

Problem 4. Prove that every nested sequence of nonempty closed bounded 
convex sets in a complete Euclidean space (not necessarily separable) has a 
nonempty intersection. 

Comment. Cf. Problem 6 , p. 66 and Problem 2, p. 141. 

Problem 5. Given a Euclidean space R, let <p l5 <p 2 , . . . , <p k , ... be an 
orthonormal basis in R and / an arbitrary element of R. Prove that the 
element 

n 

f-lw* 

k=l 

is orthogonal to all linear combinations of the form 

n 

1]h<?k 

k^l 

if and only if 

«*=(/> 9*) (k = 1,2,... ,n). 

Problem 6. According to elementary geometry, the length of the perpen¬ 
dicular dropped from a point P to a line L or plane II is smaller than the 
length of any other line segment joining P to L or IT. What is the natural 
generalization of this fact to the case of an arbitrary Euclidean space ? 

Hint. Use Theorem 6 and the result of the preceding problem. 

Problem 7. Let R be a complete Euclidean space (not necessarily separ¬ 
able), so that R has an orthonormal basis { o a ), by Problem 3. Prove that 
every vector f e R satisfies the formulas 

/= 2 (/»?)?» ll/ll 2 = 2 K/> ?a)| 2 , 

a a 

where neither sum contains more than countably many nonzero terms. 

Problem 8. Give an example of a Euclidean space R and an orthonormal 
system {<p„} in R such that R contains no nonzero element orthogonal to every 
<p„, even though {<p M } fails to be complete. 
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Comment. By Theorem 10, R cannot be complete. 

Problem 9. Given a Euclidean space R, not necessarily complete, let R* 
be the completion of R as defined in Sec. 7.4. Define linear operations and 
the scalar product in R* by “continuous extension” of those in R <= R*. 
More exactly, if x n -*■ x, y n -*y where x n , y n e R, let 

x + y = lim (x n + y „), ax = lim ax„, (x, y) = lim (x„, y n ). 

n~* oo n~* oo n-+ oo 

Prove that 

a) These limits exist and are independent of the choice of the sequences 
{xj, {y n } in R converging to x and y; 

b) R* is itself a Euclidean space. 

Complete Cf ab] in this way, and show that the resulting space is a Hilbert 
space. 

Comment. The elements belonging to the completion of C^ a b] but not to 
Cf Utb] are themselves functions, in fact discontinuous functions whose squares 
are Lebesgue-integrable on [a, b ], as defined in Sec. 29. 

Problem 10. Prove that each of the following sets is a subspace of the 
Hilbert space 4 • 

a) The set of all (x l5 x 2 . x k ,. ..) e 4 such that x x = x 2 ; 

b) The set of all (x 1 , x 2 ,. .. , x k ,. . .) e 4 such that x k — 0 for all even k. 

Problem 11. Show that every complex Euclidean space of finite dimension 
n is isomorphic to the space C n of Example 1, p. 164. Generalize Problem 9 
' to the case where C[a >6 ] is the complex space of Example 3, p. 164. 

17. Topological Linear Spaces 

17.1. Definitions and examples. Specification of a norm is only one way 
of introducing a topology into a linear space. There are many situations in 
analysis, notably in the theory of generalized functions (to be discussed 
in Sec. 21), where it is desirable to use other methods of equipping a linear 
space with a topology: 

Definition 1. By a topological linear space is meant a set E with the 
following properties : 

1) E is a linear space', 

2) E is a topological space', 

3) The operations of addition of elements of E and multiplication of 
elements of E by numbers (real or complex) are continuous with 
respect to the topology in E, in the sense that 
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a) If z Q = x„ + y 0 , then, given any neighborhood U of the point z 0 , 
there are neighborhoods V and W of the points x 0 and y 0 , 
respectively, such that x + y e U whenever x e V, y e W ; 

b) If a 0 x 0 = y 0 , then, given any neighborhood V of the point y 0 , 
there is a neighborhood V of the point x„ and a number e > 0 
such that xx e U whenever x £ V, |a — a 0 | < e. 

Theorem 1. Let E be a topological linear space, and let U be any 
neighborhood of zero. Then the set 

V + x 0 = {y:y = x + x 0 , x e U} 

is a neighborhood of x 0 . Moreover, every neighborhood of x 0 is a set of this 
form, i.e., some neighborhood of zero “shifted by the vector x 0 .” 

Proof. It follows from property 3a) that the mapping/O) = x — x 0 
carrying E into itself is continuous. Hence, by Theorem 10, p. 87, the 
preimage f~\U) of any neighborhood U of the point zero is itself a 
neighborhood. But f~\U ) = U + x 0 . Therefore U + x 0 is a neighbor¬ 
hood, obviously of the point x 0 . Similarly, given any neighborhood V 
of the point x 0 , let U = V — x 0 = V + (-x 0 ). Then U is a neighbor¬ 
hood of zero, by the continuity of the mapping g(x) = x + x 0 . But 
clearly V + x„ = V. 1 

Remark. Thus the topology in E is determined by giving a neighborhood 
base at zero, i.e., a system ./ V 0 of neighborhoods of zero with the property 
that, given any open set G <= E containing the point zero, there is a neighbor¬ 
hood NeJ^ contained in G. In fact, the mapping/(x) = x + x 0 carries a 
neighborhood base at zero into a neighborhood base at x 0 . Hence 
and its “translates,” i.e., the system of all sets of the form {V: V = U + x, 
Ue x e E), constitute a base for the topology in E. In this sense, -A r a 
“generates” the topology in E. 

Example 1. Every normed linear space is clearly a topological linear 
space. In fact, it is an immediate consequence of the properties of a norm 
that the operations of addition of vectors and multiplication of vectors by 
scalars are continuous with respect to the topology “induced” by the norm. 

Example 2. Let be the linear space of all numerical sequences x = 
(x x , .. . , x k ,. . .), real or complex, and let consist of all sets of the form 

I/,,..., — {x; x £ R , |x tl | <£,..., \x kr \ < e} 

for some number s > 0 and positive integers k u ... ,k r . Then becomes 
a topological linear space when equipped with the topology generated by 

19 As an exercise, verify that xE 0 and its translates satisfy Theorem 2 (or Theorem 3) 
of Sec. 9.3 and that the linear operations in R' rj are continuous with respect to the topology 
generated by 


Example 3. Let K [a >6] be the linear space of all infinitely differentiable 
functions on the interval [a, b], w and let consist of all sets of the form 

U r.E = {tp:? eK ia,n> |cp ( 0 ) (jv)| < e,. . . , |cp (r) (x)| < e for all x e [a, 6 ]} 

for some number s > 0 and positive integer r. Then K, ajl] becomes a topo¬ 
logical linear space when equipped with the topology generated by this 
neighborhood base (again supply some missing details). 

Definition 2. A subset M of a topological linear space E is said to be 
bounded if, given any neighborhood U of zero, there is a number a > 0 such 
that M <= al/ = {z:z — ax, x e U}. 21 

Definition 3. A topological linear space E is said to be locally bounded 
if it contains at least one nonempty bounded open set. 

Theorem 2. Every normed linear space E is locally bounded. 

Proof. Given any s > 0, the set of all x e E such that ||x|| ; < s is 
obviously nonempty, bounded and open, g 

Definition 4. A topological linear space E is said to be locally convex 
if every nonempty open set in E contains a nonempty convex open subset. 

Theorem 3. Every normed linear space E is locally convex. 

Proof. Merely note that every nonempty open set in E contains an 
open sphere. 1 

Remark. It follows from Theorems 2 and 3 that every normed linear space 
is both locally bounded and locally convex. Conversely, it can be shown that 
every locally bounded and locally convex topological linear space satisfying 
the first axiom of separation is normable, in the sense that E can be equipped 
with a norm ||-|| generating the given topology in E, via the metric p(x, y) = 
\\x-yl 

17.2. Historical remarks. For some time it was thought that the concept 
of a normed linear space (introduced in the thirties, notably in the work of 
Banach) was general enough to serve all the concrete needs of analysis. 
However, it subsequently became apparent that this was not so and that 
there are a number of problems involving such spaces as the space of in¬ 
finitely differentiable functions, the space R' a of all numerical sequences, 
etc., in which the natural topology cannot be specified in terms of any norm 
whatsoever. Thus topological linear spaces, as opposed to normed linear 


20 A function 9 is said to be infinitely differentiable if it has derivatives of all orders 
k = 0 , 1 , 2 ,... (the zeroth derivative <p (0) is just the function 9 itself). 

21 A sequence {x„} of points in E is said to be bounded if the set (xj, x 2 ,...,x„,.. .}, 
consisting of all terms of the sequence, is bounded. 















170 LINEAR SPACES 


TOPOLOGICAL LINEAR SPACES 17! 


CHAP. 4 

spaces, are by no means “exotic” or “pathological.” On the contrary, some 
of these spaces are no less natural and important a generalization of finite¬ 
dimensional Euclidean space than, say, Hilbert space. 

Problem 1. Reconcile Definition 2 with Problem 1, p. 141 in the case 
where £ is a normed linear space. 

Problem 2. Let £ be a topological linear space. Prove that 

a) If U and V are open sets, then so is U + V = {z:z = x -\-y, x e U, 
yeV}; 

b) If U is open, then so is a U — {z:z = olx, x e U\ provided that a^O; 

c) If F <= E is closed, then so is a.F for arbitrary a. 

Problem 3. Prove that a topological linear space is a T^-space if and only 
if the intersection of all neighborhoods of zero contains no nonzero elements. 

Problem 4. Prove that a topological linear space E automatically has the 
following separation property: Given any point x eE and any neighborhood 
U of x, there is another neighborhood V of x such that [V] <= U. 

Hint. If U is a neighborhood of zero, then, by the continuity of sub¬ 
traction, there is a neighborhood V of zero such that 22 

V — V — {z:z = x — y, x e V, y e V} c u. 

Suppose y e [V]. Then every neighborhood of y, in particular V-j-y, 
contains a point of V. Hence there is a point z e V such that z + y £ V. It 
follows that y e V — V c U. 

Problem 5. Prove that a topological space T has the separation property 
figuring in Problem 4 if and only if for each point x e T and each closed set 
F <= T not containing x, there is an open set 0 1 containing x and an open set 
0 2 containing F such that O 1 nO ! =0 . 

Comment. Thus, for 7Vspaces, this separation property is “halfway 
between” that of a Hausdorff space and that of a normal space. 

Problem 6. Given a topological linear space E, prove that 

a) If {x n } is a convergent sequence of points in E, then the set M — 

{*!, x 2 . x„,. . .} is bounded; 

b) A subset M <= E is bounded if and only if, given any sequence {x„} 
of points in M and any sequence {s„} of positive numbers converging 
to zero, the sequence {e„x M } also converges to zero. 


22 Here the minus sign in V ~~ V does not have the usual meaning of a set difference 
(the same kind of notation was used in Sec. 14.5). 
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Problem 7. Prove that 

a) The space J?°° of Example 2, p. 168 is not locally bounded; 

b) Every locally bounded topological linear space satisfies the first axiom 
of countability. 

Problem 8. Let x be any point of a locally convex topological linear 
space E, and let U be any neighborhood of x. Prove that x has a convex 
neighborhood contained in U. 

Hint. It is enough to consider the case x = 0. Suppose U is a neighbor¬ 
hood of zero. Then there is a neighborhood V of zero such that V — V <= U, 
where V — V is the same as in the hint to Problem 4. Since E is locally 
convex, there is a nonempty convex open set V <= V. Ifx 0 e V', then V — x 0 
is a convex neighborhood of zero contained in U. 

Problem 9. Prove that an open set U in a topological linear space is 
convex if and only if U + U — 2U. 

Problem 10. Given a linear space E, a set U c E is said to be symmetric 
if x 6 U implies —xeU. Let 38 be the set of all convex symmetric subsets 
of E such that each coincides with its own interior. Prove that 

a) 38 is a system of neighborhoods of zero determining a locally convex 
topology x in E which satisfies the first axiom of separation; 

b) The topology x is the strongest locally convex topology compatible 
with the linear operations in E; 

c) Every linear functional on E is continuous with respect to x. 

Problem 11. Two norms |Hli and IHIa in a linear space E are said to be 
compatible if, whenever a sequence {x k } in E is fundamental with respect 
to both norms and converges to a limit x e E with respect to one of them, it 
also converges to the same limit x with respect to the other norm. A linear 
space E equipped with a countable system of compatible norms J|-||„ is said 
to be countably normed. Prove that every countably normed linear space 
becomes a topological linear space when equipped with the topology 
generated by the neighborhood base consisting of all sets of the form 

U r , e = {x:xeE, Hx^ < e,... , ||x|| r < e} ( 1 ) 

for some number s > 0 and positive integer r. 

Problem 12. Prove that each of the following spaces is countably normed, 
i.e., in each case verify the compatibility of the given system of norms ||-||„: 

a) The space K {a ^ of infinitely differentiable functions on [a, b ], equipped 
with the norms 

ll/IL = sup l/ w ( 0 l (n = 0,1,2,...); 

0 = 0^71 


( 2 ) 
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b) The space S v of all infinitely differentiable functions/ (t) on (— oo, oo) 

such that f(t) and all its derivatives approach zero as |/| oo faster 
than any power of l/|r| (i.e., such that -> 0 as|f| -»■ oo for 

arbitrary p and q), equipped with the norms 

\\f\\n = sup |*7‘ <e, (0l (n = 0,1,2,...); 

P.Q^n 

c) The space ( h of all numerical sequences x — (x x , . .. , x h ,. . .) such 
that 

OO 

%k n x\ 

k= 1 

converges for all « = 0 , 1 , 2 , ... , equipped with the norms 

11 * 11 , = Jlk n 4 (n = 0 , 1 , 2 ,...). 
v *-l 

Show that (1) and (2) define the same topology in K [a b] as in Example 3, 
p. 169. 

Comment. O might be called the space of “rapidly decreasing sequences.” 

Problem 13. A norm ||-|| x is said to be stronger than a norm || -1 | 2 if there is 
a constant c > 0 such that ||x|ii\> c |[x || 2 for all x 6 E (then ||*|| a is said to 
be weaker than ||* Hj). Discuss the norms (2) in this language. 

Comment. Two norms are said to be comparable if one is stronger than 
the other, and equivalent if one is both stronger and weaker than the other 
(cf. Problem 7, p. 141). 

Problem 14. Prove that every countably normed space satisfies the first 
axiom of countability. 

Hint. Replace the system of neighborhoods U by the subsystem such 
that e takes only the values 



(this can be done without changing the topology). 

Comment. Thus the topology in E can be described in terms of convergent 
sequences (recall Sec. 9.4). 

Problem 15. Prove that the topology in a countably normed space can be 
specified in terms of the metric 


P(x, y) 


y _L II* ~ TIL 

nil 2” 1 + \\x~y\\ n 


First verify that p(x, y ) has all the properties of a metric, and is invariant 
under shifts in the sense that p(x + z, y + z) = p(x, y) for all x,y,z e E. 

Comment. A countably normed space is said to be complete if it is 
complete with respect to the metric (3). 

Problem 16. Prove that a sequence {x*} in a countably normed space is 
fundamental with respect to the metric (4) if and only if it is fundamental 
with respect to each of the norms ||-||„. Prove that {xj converges to an 
element xeE with respect to the metric (3) if and only if it converges to 
x with respect to each of the norms ||- 1 |„. 

Comment. Thus, in particular, a countably normed space E is said to be 
complete if a sequence {x k } in E converges whenever it is fundamental with 
respect to each of the norms ||-|| M . 

Problem 17. An infinite-dimensional separable linear space H equipped 
with a countable system of scalar products (•, •)« is said to be countably 
Hilbert if the norms _ 

MI» = V(*,*)„ (x 6 H) 

generated by these scalar products are compatible and if the space H is 
complete. Prove that the space <t> of Problem 12c is countably Hilbert when 
equipped with the scalar products 


(*> >0« = 2 k n x k y k O = 0, 1, 2,...), 

k= 1 

where x = (x^ ... , x k ,...), y = (yi.are any two elements of <D. 

Problem 18. The norms ||-|| n in a countably normed space E can be 
assumed to satisfy the condition 

Ml* < Mil if k <1, (4) 

since otherwise we can replace ||-|| n by 

II • || n = SUP {II ' 111, II ' II 2» • • • > II ’ ID- 
(Prove that this does not change the topology in E.) Let E n denote the 
completion of E with respect to the norm ||-||„. Using (4), prove that 

E x = £ 2 => • • • => E n => • • • . 

Clearly, 

E <= fl £„■ 

n =l 

Prove that E is complete if and only if 

OO 

E= f\E n . 

n =1 


(x, y e E). 


(3) 
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Problem 19. Let be the space of all functions defined on the interval 
[a, b ] with continuous derivatives up to order n inclusive, equipped with the 
norm 

ll/I, = sup 

a<t^b 

(note that C^ b] — C [9>1> ]). Prove that C{f hl is complete. Prove that K [a i] 
equals the intersection 

00 

n>> 

5 |C [a,& 

n —0 

and hence is complete (by Problem 18). 
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18. Continuous Linear Functionals 

18.1. Continuous linear functionals on a topological linear space. A (real) 
functional/defined on a topological linear space E is said to be linear on E if 

/(«* + Pj) = «/0) + P/O') 

for all x,y e E and arbitrary numbers a, fi (recall Sec. 13.5), and continuous 
at the point x 0 e E if, given any e > 0, there is a neighborhood U of x 0 such 
that 

I/O —/Co)I < e (1) 

for all x £ U (recall Sec. 9.6). We say that the functional/is continuous {on 
E) if it is continuous at every point x 0 6 E. 

Theorem 1. Let fbea linear functional on a topological linear space E, 
and suppose f is continuous at some point x 0 e E. Then f is continuous on 
E, i.e., at every point of E. 

Proof. Given any point y e E and any number e > 0, let U be a 
neighborhood of x 0 such that x e U implies (1). Then 

V — U + O' — x 0 ) = (z:z = x + y — x 0 , x e U} 

is a neighborhood of y, by Theorem 1, p. 168. Moreover, x e V implies 
x + x 0 — y e U and hence 

I/O - /O') I = I/O' + *o - y) -/Oo)l < e > 

i.e.,/is continuous at y. | 
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Corollary. The continuity of a linear functional on a topological linear 
space need only be checked at a single point, for example, at the point zero. 

Theorem 2. Let fbea linear functional on a topological linear space E. 
Then f is continuous on E if and only if f is bounded in some neighborhood 
of zero} 

Proof Suppose/ is continuous on E, in particular at the point zero. 
Then, given any e > 0, there is a neighborhood of zero in which 
|/(x)| < e. Obviously,/is bounded in this neighborhood. 

Conversely, suppose/is bounded in some neighborhood U of zero, 
so that |/(x)| < C for all x e U, where C is a suitable constant. Then, 
given any e > 0 , we have |/(x)| < e for all x in the neighborhood 

— U = Iz :z = — x, x e U 

c \ c 

i.e.,/is continuous at zero and hence on all of E. | 

Theorem 3. A necessary condition for a linear functional f to be 
continuous on a topological linear space E is that f be bounded on every 
bounded set. The condition is also sufficient if E satisfies the first axiom of 
countability. 

Proof To prove the necessity, suppose/is continuous on E. Then/ 
is bounded in some neighborhood U of zero: 

\f{x)\ <C (xe U). 

Let M c E be any bounded set, as defined in Definition 2, p. 169. Then 
M c at/ for some a > 0, and hence 

|/(x)| < Ca (x e M), 

i.e.,/is bounded on M. 

As for the sufficiency, let {{/„} be a countable neighborhood base at 
the point zero such that 

Uj, => U % ■=> ■ ■ ■ => £/„=>••• 

(cf. the proof of Theorem 7, p. 84). If / fails to be continuous on E, it 
cannot be bounded on any of these neighborhoods of zero. Therefore in 
each U n there is a point x n such that |/(x„)| > n. The sequence {x„} is 
bounded (recall footnote 21, p. 169), and even converges to zero, while 
the sequence f/(x„)} is unbounded. But then / fails to be bounded on 
the bounded set {x 1; x 2 ,... , x„,...}, contrary to hypothesis. 1 

Guided by Theorem 3, we introduce 


1 Recall footnote 14, p. 110. 


Definition 1 . Given a linear functional f on a topological linear space 
E, suppose f is bounded on every bounded subset of E. Then f is said to be 
a bounded linear functional. 

Remark. In general, a bounded linear functional need not be continuous. 


18.2. Continuous linear functionals on a normed linear space. Suppose 
E is a normed linear space, so that in particular E satisfies the first axiom of 
countability (recall the remark on p. 83). Then, by Theorem 3, a linear 
functional on E is continuous if and only if it is bounded. But by a bounded 
set in a normed linear space we mean a set contained in some closed sphere 
||x|| < C (recall Problem 1, p. 141). Therefore a linear functional / on a 
normed linear space is bounded (and hence continuous) if and only if it is 
bounded on every closed sphere ||x|| < C, or equivalently on the closed unit 
sphere ||x|| < 1, because of the linearity of/. In other words,/is bounded 
if and only if the number 

ll/ll = sup |/(x)| (2) 

ll«ll<i 

is finite. 

Definition 2. Given a bounded linear functional f on a normed linear 
space E, the number (2), equal to the least upper bound of |/(x)| on the 
closed unit sphere ||x|| < 1 , is called the norm of f. 

Theorem 4. The norm \\f || has the following two properties: 


sup 


I/Ml 


||X|| 

I/Ml < ll/ll Ml for all x e E. 

Proof. Clearly, 

ll/ll = sup I/Ml - sup |/(x)| 


(3) 

(4) 


(why?). But the set of all vectors in E of norm 1 coincides with the set 
of all vectors 


x 

Ml 


(x e E , x 0), 


(5) 


and hence 




l/MI 

= sup — . 

3 5*0 llxll 


ll/I! = sup I/Ml = sup 

II 3 11=1 3^0 

which proves (3). Moreover, since the vectors (5) all have norm 1, it 
follows from (2) that 

1/(41 


'(s) 


<||/|| (xef.x^O), 


which implies (4) for x # 0. The validity of (4) for x = 0 is obvious, j 
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Example 1. Let R n be Euclidean n-space, and let a be any fixed nonzero 
vector in R n . Then the scalar product 

f(x) = (x, a ) (x e R n ) 

defines a functional on R n which is obviously linear. By Schwarz’s inequality, 

l/(*)l = l(*> «)l < IMI Ml- (6) 

Therefore/is bounded and hence continuous on R n . It follows from ( 6 ) that 

JyyU Ml 0^0). (7) 

The right-hand side of (7) is independent of x, and hence 

suplMUw, 

x *0 ||x|| 

i.e., 

ll/ll < Ml- 

But choosing x = a, we get 

\m\ = |(a, a )I = liar, 

or equivalently 

II a II 

It follows from (3) that 


Example 2. More generally, let R be an arbitrary Euclidean space, and 
let a be a fixed element of R. Then the same argument as in the preceding 
example shows that the scalar product 

f(x) = (x, a) (x e R) 

defines a bounded linear functional on R, with norm 


11/11 = INI. 

Example 2. The integral 

I(x) = Px(t) dt 

da 

is a linear functional on the space C [a 6] . Since 

U(*)l = J b x(t) dt I < max |x(t)| (b — a) = \\x\\ (b — a), 

where the equality holds if x(t) e= const, we see that the functional I is 
bounded, with norm 


( 8 ) 


i 
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Example 4. More generally, let y 0 (t) be a fixed function in C [a>6] , and let 


I(x) = j b x(t)y 0 (t) dt. 

a 


Then 7 is a linear functional on C [a>6] . Since 

|I(x)| = j b x(t)y 0 (t) dt | < llxll j^lyoCOI dt, 

where the equality holds if x(t) ~ const, the functional I is bounded, with 
norm 

IUII = f bo(OI dt. (9) 

y a 

Note that (9) reduces to (8) in the case y 0 (t) == 1. 

Example 5. As in Example 3, p. 124, let 

St„(x) = x(r 0 ) 

be the linear functional on C [a , 6] which assigns to each function x(t) e C [a 6] 
its value at some fixed point t 0 e [a, b]. Clearly 

|x(t 0 )| < max |x( 0 | = ||x||, 

where equality holds if x(t) = const. Hence S , 0 is bounded, with norm 


The concept of the norm of a bounded linear functional on a normed 
linear space can be given a simple geometric interpretation. As shown in 
Theorem 4, p. 127, every nontrivial linear functional / can be associated 
with a hyperplane 

M f = {*/(*) = !}• 

Let d be the distance from the hyperplane M f to the point x = 0, defined as 

d = inf || x 1 | 

f(x )=1 

(cf. Problem 9, p. 54). Since, as always 

!/(*)! < ll/ll 11*11, 

/(x) — 1 implies 

11*11 > 777 (x e M f ), 


i 
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On the other hand, it follows from (3) that, given any s > 0, there is an 
element x E such that f E (x) = 1 and 


Therefore 


and hence 


(ll/ll - e) ll*J < I- 

d = inf ||x|| < —-i-, 

«*)“! 11/II — £ 



since s > 0 is arbitrary. Comparing (10) and (11), we get 



(ID 


i.e., the norm of the linear functional/equals the reciprocal of the distance 
between the hyperplane f(x) — 1 and the point x = 0 . 


18.3. The Hahn-Banach theorem for a normed linear space. Let f 0 (x) be a 
linear functional defined on a subset I of a linear space E, satisfying the 
condition 

l/o(*)l < P(x), (12) 

where p is a finite convex functional on E. Then, according to the Hahn- 
Banach theorem (Theorem 5, p. 132), f 0 can be extended onto the whole 
space E without violating the condition (12) As applied to bounded linear 
functionals on a normed linear space E, this result can be formulated as 
follows: 


Theorem 5 ( Hahn-Banach ). Given a real normed linear space E, let 
L be a subspace of E and f, a bounded linear functional on L Then ffi can 
be extended to a bounded linear functional f on the whole space E without 
increasing its norm, i.e., 

ll/ll on 22 ~ il/ollon L- 

Proof. We need only choose the functional p in Theorem 5, p. 132 to 
be the convex functional k ||*||,< where 

ll/olUn. I 

This form of the Hahn-Banach theorem has a simple geometric interpreta¬ 
tion. The equation 

fo(x) =1 (13) 

specifies a hyperplane in the subspace L, at distance 


1 
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from the origin (the point x = 0) The fact that the functional f 0 can be 
extended onto the whole space E without increasing its norm means that the 
hyperplane (13) can be extended to a larger hyperplane in the whole space 
E in such a way that the distance between the larger hyperplane and the 
origin is the same as the distance between the hyperplane (13) and the origin. 

In the same way, starting from the complex version of the Hahn-Banach 
theorem (Theorem 5', p. 134), we get 


Theorem 5'. Given a complex normed linear space E, let L be a 
subspace of E and /„ a bounded linear functional on L. Then /„ can be 
extended to a bounded linear functional f on the whole space E without 


increasing its norm, i.e.. 


Il/ll on E ~ ll/ollon. £• 


In the case of an arbitrary topological linear space E, a nontrivial con¬ 
tinuous linear functional on E may not even exist. However, by imposing 
suitable restrictions on E, we can guarantee the existence of “sufficiently 
many” continuous linear functionals on E. 2 


Definition 3. A topological linear space E is said to have sufficiently 
many continuous linear functionals if for each pair of distinct points 
x 1 , x 2 e E there exists a continuous linear functional f on E such that 
f(x x ) # /(x 2 ), or equivalently, if for each nonzero element x 0 e E there 
exists a continuous linear functional on E such that f(x 0 ) 0. 


Theorem 6. Every normed linear space E has sufficiently many con¬ 
tinuous linear functionals. 


Proof. Given any nonzero element x 0 e E, we define a linear 


functional 


/ 0 (Xx 0 ) — X 


on the set L of all elements of the form ax 0 . We then use the Hahn- 
Banach theorem to extend f 0 onto the whole space E. This gives a 
continuous linear functional on E such that/(x 0 ) =1^0. 1 


Problem 1. Prove that a functional / on a T^-space E is continuous at a 
point x e E if and only if x n -*■ x implies f(x n ) -* f {x). 

Problem 2. Prove that every linear functional on a finite-dimensional 
topological linear space is automatically continuous. 

Problem 3. Let £ be a topological linear space. Prove that a linear 
functional / on E is continuous if and only if 

a) Its null space { x:f(x ) = 0} is closed in E; 

b) There exists an open set U <= E and a number t such that t$f(U). 


2 See Theorem 6 and Problems 7-8. 
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Problem 4. Given a topological linear space E, prove that 

a) If every linear functional on E is continuous, then the topology in 
E is the topology r of Problem 10, p. 171; 

b) If E is infinite-dimensional and normable, then there exists a non- 
continuous linear functional on E; 

c) If E has a neighborhood base at zero whose power does not exceed 
the algebraic dimension of E, then there exists a noncontinuous linear 
functional on E. 

Hint. In b) use the existence of a Hamel basis in E (recall Problem 4, 
p. 128, where algebraic dimension is also defined). 

Problem 5. Prove that 

f(x) = ax(0) + bx( 1 ), 

g(x) = / 0 1/a x(0 dt - J^x(t) dt 

are both bounded linear functionals on the space C [01] . What are their 
norms? 

Problem 6. As in Problem 11, p. 171, let £ be a countably normed space 
with norms |j-||„, where 

Mli < ML < •' • < Mil, < '' ‘ U 4 ) 

(as in Problem 18, p. 173, this condition entails no loss of generality). 
Let E* be the set of all continuous linear functionals on E, and let E* be 
the set of all linear functionals on E which are continuous with respect to 
the norm ||• ||„. Prove that 

E* c E* c • • • c E* c • • • 

and 

E* = UE* n . (15) 

n=l 

Hint. If/ is a continuous linear functional on E, then, by Theorem 2, 
there is a neighborhood U of zero in which / is bounded. It follows from 
(14) and the definition of the topology in E that there is a number s > 0 and 
a positive integer k such that the open sphere MU* < e is contained in U. 
Being bounded on this sphere,/is bounded and continuous with respect to 
the norm ||.|L. 

Comment. Let/ be a continuous linear functional on E, i.e., let /£ E*. 
Then by the order of / is meant the smallest integer n for which /e E*. It 
follows from (15) that every continuous linear functional on E is of finite 
order. 


Problem 7. Prove that every countably normed space E has sufficiently 
many continuous linear functionals. 

Hint. Given any nonzero element x 0 e E, use Theorem 6 to construct a 
linear functional / continuous with respect to the norm IpIL such that 

/Mo) # 0. 

Problem 8. Show that every real locally convex topological linear space 
E satisfying the first axiom of separation has sufficiently many continuous 
linear functionals. 

Hint. Given any nonzero element x 0 e E, show that there is a convex 
symmetric 3 neighborhood U of zero such that x a $ U. Let p v be the 
Minkowski functional of U. Then, as in the proof of Theorem 6 , p. 136, 
p v is a finite convex functional on E such that p D (~x) = p v (x) and 

Pu(x) <1 if xeU, Pu(x 0 ) > 1. 

Define a linear functional f 0 (\x 0 ) = X on the set L of all elements of the 
form Xx 0 . Clearly |/ 0 (x)| < p n (x) on L and / 0 (x 0 ) = 1. Now use the Hahn- 
Banach theorem to extend f 0 onto the whole space E. 

Comment. The importance of locally convex spaces is mainly due to this 
property (which continues to hold in the complex case). 

19. The Conjugate Space 

19.1. Definition of the conjugate space. The operations of addition of 
functionals and multiplication of functionals by numbers are defined in the 
obvious way: 

Definition 1 . Let f and g be two functionals defined on a topological 
linear space E, and let a. be any number. Then by the sum of f and g, 
denoted by f + g, is meant the functional whose value at every point x e E 
is the sum of the values of f and g at x, while by the product of a and f, 
denoted by of, is meant the functional whose value at every point x e E is 
the product of a and the value of f at x. More concisely, 

(/ + g)(x) = /(*) + g(x), 

of (x) = of (x) 

for every x e E. 

Clearly, if f and g are linear functionals, then so are/ + g and of. More¬ 
over, if/and g are bounded (and hence continuous), so are/ + g and of. 


3 Recall Problem 10, p. 171. 
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Let E* be the set of all continuous linear functionals on E. Then the 
space E*, called the conjugate space of E, is itself a linear space, when 
equipped with the operations of addition of functionals and multiplication 
of functionals by numbers. This can be seen at once by verifying the three 
axioms in Definition 1, p. 118. Note that the zero element in E* is the 
functional / = 0, equal to zero for all x e E. 

The next step is to introduce a topology in E*, besides the linear operations 
just described. This can be done in various ways. First we consider the 
particularly simple case where the original space if is a normed linear space. 


19.2. The conjugate space of a normed linear space. Let/be a continuous 
linear functional on a normed linear space E. In Sec. 18.2 we introduced the 
concept of the norm of /, equal to 


ll/ll = sup 

x&0 


l/(*)l 

1*1 


(recall Theorem 4, p. 177). This quantity clearly has all the properties of a 
norm, as listed on p. 138. In fact, 


1) 11/II > 0 where ||/|| = 0 if and only if/ = 0; 

2) l|a/|| = |«| ||/||; 

3) U + gl < ll/ll + llgll, since obviously 


sup L/w..±.gWi 

x*0 || XII 



|g(x)l 
11*11 ■ 


Hence the space E* conjugate to E can be made into a normed linear space 
by simply equipping each functional f e E* with its norm ||/||. The corre¬ 
sponding topology in E* is called the strong topology in E*. In cases where 
we want to emphasize that E* is equipped with the norm ||-||, we will write 
(E*, ||•!!) instead of E*. 


Example 1. Let E be Euclidean n-space (real or complex), and let 
e x . e n be any basis in E, so that every vector x e E has a unique repre¬ 

sentation of the form 

n 

k=l 

If/is a linear functional on E f then clearly 


/(*) = 2 /(«*)**• (i) 

Thus a linear functional on E is uniquely determined by its values on the 
basis vectors e x ,... , e n , where these values can be assigned arbitrarily. 
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Consider the linear functionals f u .. . ,/„ defined by 

(1 if j = k, 

■« e Ho 

It is clear that these functionals are linearly independent, and moreover that 

f.(x) = Xj . 

Hence we can write (1) in the form 

n 

/(*) = 2 f(e k )f k (x). 

k=l 

Thus the functionals ft,... ,f n form a basis in the space E*, called the dual 
of the basis e x ,... ,e n in the original space E. Therefore E* is itself an 
n-dimensional linear space. Of course, different norms in E “induce” 
different norms in E* (see Problem 1). 

Example 2. Let c 0 be the space of all sequences x = (x 1; ..., x k ,.. .) 
converging to zero, with norm 

11*11 = sup ||xj. 

k 

Then the space (c*, ||-||) conjugate to c 0 is isomorphic (see footnote 17, 
p. 155) to the space l t of all absolutely summable sequences /= (/j,. .. , 
f k , .. .), 4 with norm 

ii/ii = i i/*i. 

£=1 

To prove this, we first note that, given any element/ = (/,. . . ,/., . ..) e l v 
the formula 

/(*)=!>*/* ( 2 ) 

i=l 

defines a functional / on the space c 0 , where / is clearly linear. Moreover, 
it follows from (2) that 


and hence 


!/(*)! < Ml\fkU 


ll/ll < II/II- 


■ 4 A sequence {/*,}, or / = (f u ....,/*,.. ,).ih “point notation,” is said to be absolutely 
summable if 

CO 

21 M < °°- 
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Consider the vectors 


in c 0 , and let 


= ( 1 , 0 , 0 ,...), 
e 2 = (0, 1,0,...), 
e 3 = ( 0 , 0 , 1 ,...), 


n f 

x <*> = 2 — 


* =1 \fk\ 

(if f k = 0, we set f,J\f k \ = 0). Then x <n> e c 0 , and 

|U <W) || < 1. 


Moreover 


so that 


*-i \fk\ *-i 


lim/(* ( - , )=2l/*| = ||/||. (5) 

n-*cc k—1 

It follows from (4) and (5) that V 

Il/ll > 11/II (6) 

(why?). Comparing (3) and ( 6 ), we get 

II/II = Il/ll- 

Thus the mapping carrying / into / is a “norm-preserving” mapping of 
into c*. We must still verify that this mapping is one-to-one and “onto” 
(see p. 5), i.e., that every functional fee* has a unique representation of 
the form ( 2 ), where /= (A, ...,/*,...) e A Let x = (xq, .. ., x k , . ..) e c 0 . 


* = 

k =1 

where the series on the right converges in c 0 to the element x, since 

n 

x — 2 x k e k = sup \x k \ -> 0 

k—l k>n 

as n -> oo. Since the functional/e c* is continuous, 

CO 

/(*) = XxJ(e k ) 

k= 1 

(where is the continuity used?). Hence /has a unique representation of the 
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form ( 2 ), and we need only verify that 


This time let 


2 l/(e*)l < <»• 


1 /(O 
£ 1/(01 


Noting that x (m) e c 0 and \\x {n) \\ < 1, we find that 

ii/(oi=i 4^/(0 =/(x (n) ) < ii/ii. 

*=i 1/(01 

But this implies (7), since n can be made arbitrarily large. 

Whether or not the original space E is complete, we have 

Theorem 1. The conjugate space (E*, ||-||) is complete. 

Proof. Let {/„} be a fundamental sequence of functionals in E*. Then, 
given any e > 0, there is an integer N such that n,n' > N implies 

II fn -fn’W < S, 


so that 


I AW -M*)\ < II fn /i'll 11*0 < e 11*11 


for every x e E. Therefore the sequence {/„(*)} is fundamental and hence 
convergent for every x e E. Let 

/(*) = Hm fjx). 

n->co 

Then/is linear, since 

/(ax + Py) = lim/„(ax + Pt) 

00 

= lim [a/„(x) + p ffy)] = a/(x) + (3 f(y). 

n~* co 

Moreover, choosing n so large that \\f n —/„ +JJ || < 1 for all p > 0, we 
have H/n+J < ||/„!! + 1 for all p > 0 , and hence 


It follows that 


\fn+fx)\ < (HA II + 1) ||x||. 
lim|A + /x)| = |/(x)| < (HAH + 1) ||x||, 


so that / is bounded and hence continuous. 

To complete the proof, we now show that the functional/is the limit 
of the sequence {/„}, i.e., that 

lim HA-/|| =0. (8) 


I 
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Given any s > 0, let n be so large that 

Wfn —fn+J < | (9) 

for all p > 0. By the definition of the norm in E*, there is a nonzero 
element x n _ t eE such that 


Wfn -/II + - £ = l/,00 -/(Ol + f, 

ll*«J 3 3 


where 


Therefore 


HA —/II < I /»(«»,») -/„+/« M , e )l + \fn+fUnJ - f(u n J\ + ^ 

^ II/m /rH-pll Vn,e!l 4” l/n+:pVn,s) /Vn,e)l "T ^ » 

or 

ll/» -/II < -/(«„,e)l + J (10) 

after using (9) and the fact that ||w n ,J = 1. But 

fn+v(Mn,s) 

*>-+/ CO 

by the very definition of /. Hence, taking the limit as p -* co in (10), 
we get 

IIA-/II < e, 

which implies ( 8 ), since e > 0 is arbitrary. | 

Next we examine the structure of the space conjugate to a Hilbert space: 

Theorem 2. Let H be a real Hilbert space. Then , given any x 0 e H, 
the formula 

fix) = (x, x 0 ) (x G H) (11) 

defines a continous linear functional on H, with ||/|| = ||x 0 ||. Conversely, 
given any continuous linear functional f on H, there is a unique element 
x 0 e H such that (11) holds, with ||xj = \\f\\. 

Proof. Given any x 0 eH, formula (11) obviously defines a linear 
functional on H. By Schwarz’s inequality, 

1/0)1 = IO,*o)l < IOII lOoll. (12) 


so that / is bounded and hence continuous. Moreover ||/|| = ||x 0 ||> 
because of ( 12 ) and the fact that /O 0 ) = JOoll 2 - 

Conversely, let/be any continuous linear functional on H. If/ = 0, 
then / obviously has the representation ( 11 ) with x 0 = 0 (in this case 
lOoll = ll/ll = 0)- Otherwise, let 

H 0 = {x:f(x) = 0} 

be the null space of f. Since/is continuous, H 0 is a closed subspace of H. 
According to Theorem 3, Corollary 2, p. 126, the codimension of the null 
space of any nontrivial linear functional / equals 1. Therefore, by 
Theorem 14, Corollary 2, p. 159, the orthogonal complement H' (i of the 
space H 0 is one-dimensional, i.e., there exists a nonzero vector y 0 
orthogonal to H 0 such that every vector x e H has a unique repre¬ 
sentation of the form 

* = y + Vo, (13) 

where y e //,. Clearly, there is no loss of generality in assuming that 
llToll = 1- Now let 

* 0 =f(yo)yo- (14) 

Then, given any x e H, we have 

f(x) = fly + Vo) = Wo) 

because of (13), and 

(x, x 0 ) = X(y 0 , x 0 ) = X/(j 0 )(y 0 ,To) == Wo) 

because of (14). Therefore (11) holds for all xeH. To prove the 
uniqueness of x 0 , suppose 

f{x) = {x,x' 9 ) (xeH). (11') 

Then, subtracting (IT) from (11), we get 

(x, Xq -- xp) = 0 (x G H), 

which immediately implies x' = x 0 after choosing x — x 0 — x'. 1 

Corollary. The correspondence x 0 <->/ is an isomorphism between 
H and H*, regarded as normed linear spaces. 

Proof. If 

fix) = (x, x 0 ), g(x) = (x, y 0 ), 

then 

«/(*) + M x ) = ( X, «x 0 + Vo)- 
Moreover ||xj = ||/|[. 1 

19.3. The strong topology in the conjugate space. Let E be a normed lin¬ 
ear space. Then as we have seen, the conjugate space E* is itself a normed 
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linear space, and a neighborhood of zero in E* means the set of all continuous 
linear functionals on E satisfying the condition ||/|| < s for some s > 0. In 
other words, for a neighborhood base at zero in the space E* we can take 
the set of all functionals in E* such that \f(x)\ < s when x ranges over the 
closed unit sphere ||x|[ < 1 in the space E. Suppose is is a topological linear 
space, but not a normed linear space. Then in defining the topology in E* it 
seems natural to start from an arbitrary bounded set A <= E, since there is no 
longer a “unit sphere.” This suggests 

Definition 2. Let E be a topological linear space, with conjugate 
space E*. Then by the strong topology 5 in E* is meant the topology 
generated by the neighborhood base at zero consisting of all sets of the form 

U A, e = {/: I/Ml < s/or all x e A} (15) 

for some number s > 0 and bounded set A <= E. e 
Regardless of the topology in the original set E, we have 

Theorem 3. The conjugate space E*, equipped with the strong 
topology, is a locally convex T^-space. 

Proof. If f 0 e E* and / 0 fz 0, then there is an element x 0 e E such 
that f 0 (x 0 ) f 0. Let 

e = 4 l/M)!» A = {*„}. 

Then clearly /„ fp U A z , and hence E* is a 7\-space. To verify that the 
strong topology in E* is locally convex, we need only note that U Ait is 
a convex set in E* for any s > 0 and any bounded set A c E. | 

Remark. The strong topology in E* will be denoted by the symbol b. 
In cases where we want to emphasize that E* is equipped with the strong 
topology, we will write (E*. b ) instead of E*. 

19.4. The second conjugate space. Since the set of all continuous linear 
functionals on a topological linear space E is itself a topological linear space, 
namely the conjugate space ( E*,b ), we can also talk about the second 
conjugate space E** = (£*)*, i.e., set of all continuous linear functionals 
on E*, the third conjugate space E*** = (E**)*, and so on. 

Theorem 4. Given a topological linear space E with conjugate space 
E*, let x 0 be any fixed element of E. Then 

_ K(/> =/M) 

5 As opposed to the weak topology in E*, to be discussed in Sec. 20.3. 

6 See Problem 8. 
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is a continuous linear functional on E*. 

Proof. The linearity is obvious, since 

K( a / + ft?) = a /(*o) + ft g(x 0 ) = a<k, 0 (/) + (ty* 0 (g) (/, g e E*). 

As for the continuity, given any s > 0, let A be a bounded subset of E 
containing x 0 , and let U Ae be the neighborhood (15). Then 

IK(/)I = l/M)l < e if /6 U A .', 

i.e., the functional is continuous at 0 and hence continuous on the 
whole space E*. | 

Thus the mapping 

n(x) = <W/), 

called the natural mapping of E into E*, is a mapping of the whole space 
E onto some subset -(E) of the second conjugate space E**. Clearly t. is 
linear, in the sense that 

7i(ax + (3y) =/(« + $y) = af(x) + (3/(y) = cm(x) + (3tc (y). 

Suppose E has sufficiently many continuous linear functionals, e.g., suppose 
£ is a normed linear space or a locally convex topological linear space 
satisfying the first axiom of separation . 7 Then n is one-to-one, since, given 
any two distinct elements x u x 2 e E, there is a functional / e E* such that 
f(xj) f(x 2 ) and hence tcM) n(x 2 ). Being the conjugate space of (E*, b), 
E** can also be equipped with a strong topology (introduced by the obvious 
analogue of Definition 2), which we denote by b*. 

If n(E) = E**, the space E is said to be semireflexive. It can be shown 
(see Problem 9) that the inverse mapping n~ l carrying n(E) into E is always 
continuous. If E is semireflexive and if tc (as well as nr 1 ) is continuous, 
the space E is said to be reflexive and t r then establishes a homeomorphism 
between the space E and (£**, b*). In this case, each element x £ E can be 
identified with the corresponding element 7 t(x) e E**, and hence it is con¬ 
venient to denote the value of a functional / e E* at the point x e E by the 
more symmetric notation 

/M = (/, x). 

Thus (/, x) can be regarded as a functional on E for each fixed f e E*, and as 
a functional on E* for each fixed x e E (in the latter case, x also acts like 
an element of £**). 

Theorem 5. If E is a normed linear space (so that in particular E* 
and E** are also normed linear spaces), then the natural mapping of E 
into E* * is an isometry. 


7 Recall Problem 8 , p. 183. 
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Proof. Given an element x eE, let Ml denote the norm of x in E and 
Mia the norm of its image in E**. We want to show that Ml = ||x|| 2 . 
To this end, let /be any element of E*. Then 

l(/, *)l< ll/ll Ml, 

i.e., 

Ml > (/# 0), 

ll/ll J 

and since the left-hand side is independent of/, 

Ml > S U P —p = ||x|| 2 . (16) 

/sE* II/II 

On the other hand, by the Hahn-Banach theorem, for every x 0 e E there 
is a linear functional / 0 such that 

l(/o>Wo)l — ll/oll (Moll- (17) 

In fact, to construct such a functional, we need only set/ 0 (x) = X for any 
element of the form Ax 0 , and then extend/„ to a functional on the whole 
space E (without changing its norm). It follows from (17) that 

Mla = sup > Ml'. (18) 

f<&* ll/ll. 

Comparing (16) and (18), we get 

MI = Mla. 1 

Corollary. The concepts of semireflexivity and refiexivity coincide 
for a normed linear space. 

Proof. If the natural mapping tc is an isometry, then obviously both 
tc and nr 1 are continuous. | 

Remark. According to Theorem 5, every normed linear space E is iso¬ 
metric to the linear manifold ix{E) <= E** 8 . Identifying E with 7 r(£), we 
can assert that E <= E** in general, and E = E** if E is reflexive (or 
semireflexive). 

Theorem 6. Every, reflexive normed linear space is complete. 

Proof. If E is reflexive, then E = E**. But E** = (E*)* is complete, 
by Theorem 1, p. 187. J 


The set ir(E) need not be closed. 


sec. 19 


THE CONJUGATE SPACE 193 


Example 1. Finite-dimensional Euclidean spaces and Hilbert space are 
the simplest examples of reflexive spaces (in fact, for such spaces E = E*). 
This follows from Theorem 2 (cf. Problem 5). 

Example 2. The space c 0 of all sequences a = (x x , .. . ,x k ,. ..) converging 
to zero is an example of a complete nonreflexive space. In fact, as we saw 
in Example 2, p. 185, the conjugate space of c 0 is the space l x of all absolutely 
summable sequences, which in turn has the space m of all bounded sequences 
(not necessarily converging to zero) as its conjugate space (see Problem 2c). 

Example 3. It can be shown that the space Cj- 6] of all continuous 
functions on [a, b] is nonreflexive, and even that there is no normed linear 
space with C, ab , as its conjugate space. 

Example 4. The space l v , where 1 < p 2, is an example of a reflexive 
space which does not coincide with its conjugate space. In fact, l* =.■/„ 
where 

M-i. 

and hence l** = l* = /,. 

Problem 1. Let E be Euclidean n-space (real or complex), and let 
e x ,. .. , e n be a basis in E. Let x x ,. . . , x n be the coordinates of a vector 
x e E with respect to the basis e u ... , e n , and let f 1 , .. . ,/" be the coordi¬ 
nates of a functional /e E* with respect to the dual basis/j,... ,/„. Prove 
that in each of the following pairs, the norm in E* is the norm “induced” 
by the corresponding norm in E: 


/ n \l/2 

a ) Ml = (2l^n , 

ll/ll = 

/ n \l/2 

(Il/T) ; 

\*=i / 


\*=i / 

/ n \l/v 


/ n \l/q 

b) «*||= 2W1 , 

ll/ll = 

(lirr) 

\*=1 / 



where - + - = 1 

(p, q > 

i); 


P q 

c) Ml = sup \x k \, ||/|| =21/*!; 

Kk<n 1 

d) ||X|| =±x k , ll/ll = sup |/*|. 

*=1 0<k<n 

Problem 2. Let l v be the normed linear space of all sequences x — 
Mi,. .. , x k , . ...) with norm 
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Prove that 

a) If p > 1, the space /* conjugate to l v is isomorphic to the space l„, 
where 

i + i = l ; 
p q 

b) if P > 1, the general form of a continuous linear functional on l B is 

00 

/(*) = 2 >*/*> 

k=l 

where x = (x l5 ... , x k , ...) e /„,/ = C/i ,... ,f k ,...)e l„\ 

c) If p = 1, /* is isomorphic to the space m of all bounded sequences 
x = (x lt . . . , x k ,.. .) with norm ||x|| = sup \x k \. 

it 

Problem 3. Let E be an incomplete normed linear space, with completion 
£. Prove that the conjugate spaces E* and (£)'* are isomorphic. 

Hint. Given any / e E *, extend /by continuity to a functional / e (£)*. 
Conversely, given any /e (£)*, let / be the restriction of / to E, namely 
the functional /(x) — f(x) for all x e E. Show that /«-»/ is the desired 
isomorphism (with ||/|| = ||/||). 

Problem 4. Let E be an incomplete Euclidean space with the Hilbert 
space H as its completion. Prove that E* and H are isomorphic. 

Problem 5. Particularize Theorem 2 to the case of a finite-dimensional 
Euclidean space. 

Problem 6. Generalize Theorem 2 to the case of a complex Hilbert space. 

Hint. Write x 0 — /(y 0 )To instead of (14). The isomorphism of H and H* 
associating the functional f(x) = (x, x 0 ) with x 0 is then “conjugate-linear” 
in the sense that a f is associated with ax 0 . 

Problem 7. Let <S> be the same countably normed space of “rapidly 
decreasing sequences” as in Problem 12c, p. 172. Find the conjugate space d>*. 

Hint. Use Problem 6, p. 182. 

Ans. <t>* is the space of all functionals/of the form 

/(*>= 

where/ = (f l9 . .. ,/ fc ,...) is any sequence satisfying the condition 

00 

lk~ n ft< oo 

k.=l 

for some nonnegative integer n. 


Problem 8. Let E, E*, and U A s be the same as in Definition 2. Verify 
that the system U A e actually generates a topology b in E* such that the 
linear operations in E* are continuous with respect to b. Prove that if E 
is a normed linear space, then b coincides with the “norm topology” of 
Sec. 19.2. 

Problem 9. Let E be a topological linear space, and let b* be the strong 
topology in E** and tc the natural mapping of E into E**. Prove that 7 r~ l 
is continuous. 

Hint. The topology b* induces a topology n -1 (6*) in the space E, in 
which a set G c E is said to be open if its image tv(G) is the intersection of 
n(E) with an open subset of (£**, b*). Show that 7i r l (b*) is stronger than 
the original topology in E. 

Problem 10. Prove that every closed subspace of a reflexive space is itself 
reflexive. 


20. The Weak Topology and Weak Convergence 

20.1. The weak topology in a topological linear space. Let £ be a topo¬ 

logical linear space, with conjugate space E*. Given any e > 0 and any 
finite set of continuous linear functionals / x . f r e E*, the set 

u = u n . /*;« = ( x: 1/iWI < s,... , |/ f (x)| < e} (1) 

is open in E and contains the point zero, i.e., U is a neighborhood of zero. 
Let JA 0 be the system of all sets of the form (1). Then is a neighborhood 
base at zero, generating a topology in E which is again the topology of a 
topological linear space (the details are left as an exercise). This topology is 
called the weak topology in E. Every subset of E which is open in the weak 
topology is also open in the original topology of E, but the converse may 
not be true, i.e., may not be a neighborhood base at zero for the original 
topology in E. In other words, the weak topology is weaker (as defined on 
p. 80) than the original topology, as anticipated by the terminology. 
Clearly, the weak topology in E is the weakest topology t with the property 
that every linear functional continuous with respect to the original topology 
is also continuous with respect to t. 

20.2. Weak convergence. The weak topology in E may not satisfy the 
first axiom of countability, even in the case where £ is a normed linear space. 
Hence the weak topology cannot in general be described in the language of 
convergent sequences. Nevertheless, the weak topology determines an 
important kind of convergence in E, called weak convergence. By contrast, 
the convergence in E determined by the original topology (by the norm, if 
£ is a normed linear space) is called strong convergence. 
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Theorem 1. A sequence {xj of elements in a topological linear space 
E is weakly convergent to an element x 0 e E if and only if the numerical 
sequence {f(xj} converges to f(x 0 ) for every feE*, i.e., for every 
continuous linear functional f on E. 

Proof. Clearly, there is no loss of generality in assuming that x 0 — 0. 
Suppose f(x n ) ->■ 0 for every /e E*. Then, given any “weak neighbor¬ 
hood” (1), let N f be such that \fj[x n )\ < t for all n > N i {i — 1, . .. , r), 
and let N = max {N u . . . , Nj. Then x n e U for all n > N, i.e., {x n } 
converges to x 0 in the weak topology. 

Conversely, suppose that for each neighborhood (1), there is an inte¬ 
ger N = N(U) such that x n e U for all n> N. Then obviously/ (x„) -* 0 
for any given feE*, as we see by choosing/to be one of the functionals 
f lt ... ,f r figuring in the definition of U. | 

Specializing to the case where £ is a normed linear space, we have 


Theorem 2. Let {x n } be a weakly convergent sequence of elements in 
a normed linear Space E. Then {xj is bounded, i.e., there is a constant C 
such that „ „ „ . , „ 

II* J < C (« = 1,2,...). 


Proof. Suppose {x n } is unbounded. Then {xj is unbounded on every 
closed sphere s(/ „ „ , {/; „„ /( , < „ 

in E*, in the sense that the set of numbers 


{(/, *„) :fe S[f 0 , s], n = 1,2,...} 

is unbounded for every S[f 0 , s] <= E*. In fact, if the sequence {x j is 
bounded on S[f 0 , s], then it is also bounded on the sphere 

S[ 0, «] = {*: llgll < e}, 

since if g e S^O, s], then 

/o + g e S[f„, e], 

(, g . xj> = (/ 0 + g, xj) - (/o, xj), 


where the numbers (/ 0 , xj) are bounded, by the weak convergence of 
{xj. But if |(g, xj\ < C for all g e ^[0, e], then, by the isometry of the 
natural mapping of E into E**, 


sup |(g, xj\ 
M<i 


1 C 

- sup |(g, xJI < - 

£ \\ g \\<5 £ 


(« = 1 , 2 ,...), 


so that {xj is unbounded, contrary to assumption. It follows that if {xj 
is unbounded, then {xj is unbounded on every closed sphere in E*. 


Next, choosing any closed sphere S Q <= E*, we find an integer «, and 
an element/e S a such that 

!(/> x n )\ > 1. (2) 

Since (/, x) depends continuously on x, the inequality (2) holds for all/ 
belonging to some closed sphere S l c: S 0 . Repeating this argument, we 
find an integer n 2 and a closed sphere S 2 c s t such that 

\(f,Xn)\>2 

for all /e S 2 , and so on, where in general there is an integer n k and a 
closed sphere S k <= such that 

\{fx n )\ > k 

for all /£ S k . At the same time, we can obviously see to it that the 
radius of the sphere S k approaches zero as k -* oo. Since E* is complete, 
by Theorem 1, p. 187, it follows from the nested sphere theorem 
(Theorem 2, p. 60) that there is an element / contained in all the 
spheres S k . But then 

\(f, x n j\ > k [Ic =1,2,...), 

contrary to the assumed weak convergence of the sequence {xj. | 

Corollary 1. Let {xj be a sequence of elements in a normed linear 
space E such that the numerical sequence {(/, xj) is bounded for every 
feE*. Then {xjf is bounded. 

Proof. In proving Theorem 2, the weak convergence of {xj was 
invoked only to infer the boundedness of the sequence {(/„, xj}. 1 

Generalizing Corollary 1, we get 

Corollary 2. Let M be a weakly bounded subset of a normed linear 
space E, i.e., a subset bounded in the weak topology. Then M is strongly 
bounded, i.e., M is contained in some closed sphere. 

Proof. Suppose M contains a sequence {xj such that || x n || =o,and 
let M' be the set of all points x n (n = 1,2, . ..). Since M is weakly 
bounded, so is M'. This means that M' is “absorbed” by any weak 
neighborhood of zero, in particular by any neighborhood 

U={x-.\{f,x)\ < l,feE*}, 

in the sense that there is a number a > 0 such that M' <= a t7. But then 
I(/, xj\ < a for all n, which, by Corollary 1, contradicts the assumption 
that IM ->- 0. i 
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Corollary 3. A necessary and sufficient condition for a subset M of 
a normed linear space E to be ( strongly ) bounded is that every continuous 
linear functional f e E* be bounded on M. 

Proof. The necessity follows at once from the inequality 

l(/,*)l < !|/|| ||*||, 

while the sufficiency is an immediate consequence of Corollary 2 and the 
meaning of weak boundedness. | 

A useful test for weak convergence of a sequence is given by 

Theorem 3. A bounded sequence {x n } of elements in a normed linear 
space E is weakly convergent to an element x e E if f (x„) -*■ /(x) for 
every f e A, where A is any set whose linear hull is everywhere dense in E*. 

Proof Let <p be an arbitrary element of E*, and let be a sequence 
of linear combinations of elements of A converging to cp (such a sequence 
exists, since A is everywhere dense in E*). Let C be such that 

Mice, H*,|| < C (n = 1,2,...). 

Moreover, given any e > 0, choose k so large that || <p — tpj < e (this 
is possible, since <p k -» 9). Then 

l<P(*n) - 9Ml < !?(*„) - ?*(*„)| + I?*(*„) - <P*(*)I 

+ l?*(*) - <p(*)l 

< Cs + Cs + | <? k (x n ) - <p*(x)|. (3) 

But <p s (x„) -*■ <p*(x) as n -* oo, since 9* is a linear combination of 
elements of A, and /(xj ->■ f(x) for every f e A, by hypothesis. There¬ 
fore we can make the right-hand side of (3) as small as we please, by 
choosing e sufficiently small and n sufficiently large. It follows that 
<p(:O -*■ <p(x) for every 9 e E*, i.e., {x„} converges weakly to x. | 

The meaning of weak convergence in various spaces is illustrated by the 
following examples : 

Example 1. Given a finite-dimensional Euclidean space R n , let e x ,..., e n 
be any orthonormal basis in R n , and let {x <4) } be a sequence in R n converging 
weakly to a vector x = {x x , . . . , x„) e R n . Then 

(x (k) , ef) = x\ k) -> (x, ef) = x x (./ = 1 
i.e., for every j the sequence {x (w } of components of the vectors x a) converges 
to the corresponding component of the limit vector x. But then 
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as A: -> 00, so that {x (i °} converges strongly to x. On the other hand, strong 
convergence obviously implies weak convergence in any space. Thus we see 
that weak convergence and Strong convergence are equivalent concepts in R n . 

Example 2. Let {x {k) } be a (strongly) bounded sequence of elements of 4- 
Then {x m } converges weakly to an element x e / 2 if 

(x (7c> , ef) = x) ,c) (x, ef) = x x (j = 1,2,.. .), 

where 

e x = (1,0, 0,.. .), e 2 = (0, 1,0,...),... 

is an orthonormal basis in 4. This follows from Theorem 3, since linear 
combinations of the elements e x , e 2 , . . . are everywhere dense in 4> which 
coincides with its own conjugate space (recall Problem 2a, p. 194). Thus 
weak convergence in 4 has the same interpretation in terms of components 
as in R n , i.e., for every j the sequence {xf} of components of the vectors 
x ik) converges to the corresponding component of the limit vector x. How¬ 
ever, the concepts of weak convergence and strong convergence no longer 
coincide in 4- In fact, although obviously not strongly convergent, the 
sequence of basis vectors { e k } converges weakly to zero. To see this, we note 
that by Theorem 2, p. 188, every continuous linear functional/on 4 can be 
written as a scalar product 

fix) = (x, a) 

of a variable vector x e 4 with a fixed vector a = (a x ,, a n ,...) e 4, so 
that in particular 

fief) = a k . 

But a k 0 as k -*■ 00 for every a e 4, and hence f(e k ) -* 0 = /(0). 

Example 3. Consider the space C [B 6] of all functions continuous on 
[a, b], and let (x„(t)} be a sequence of functions in C [a il] converging weakly 
to a function x(t) e C Ia , 6i . Among the continuous linear functionals on C [a ^, 
we have the functionals 8 ta , a < t 0 < b (see Example 5, p. 179), where 8 to 
assigns to each function x(t) e C [Kii) ] its value at the fixed point 4. Clearly, 

V*„) S t„W 

means that 

x n ih) -* x(4). 

Hence, if the sequence {x n (t)} is weakly convergent, then 

1) (x K (f)} is uniformly bounded on [a, b\, i.e., there is a constant C such 
that \x n (t)\ < C for all n = 1, 2,. . . and all t s [a, b]; 8 

2) {x n {t)} is pointwise convergent on [a, b], i.e., {x n (t)} is a convergent 
numerical sequence for every fixed t e [a, b]. 


p(x lk) , x) = 


x ff 0 


9 This follows from Theorem 2. 
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20.3. The weak topology and weak convergence in a conjugate space. Let 
£ be a topological linear space, with conjugate space E*. Suppose that 
in Definition 2, p. 190, we require A to be finite instead of bounded. Then 
the resulting topology, generated by the neighborhood base at zero consisting 
of all sets of the form 

Ua.s — (/:|/(*)l < £ for all A} (4) 

for some number s > 0 and finite set A E, is called the weak topology in 
E* instead of the strong topology. Clearly, the set (4) can also be written as 

U XI , .... = U A , e = {/: |/(*)l < e,... , \f{x n )\ < c} (4') 

for some s > 0 and points x lt . . . , x n e E. Since every finite set A cr E is 
bounded, while in general there are bounded infinite sets in E, the weak 
topology in E* is in fact weaker than the strong topology in E* (and in 
general does not coincide with the strong topology). 

The weak topology in E* determines a kind of convergence in E*, called 
weak convergence ( of functionals). Weak convergence of functionals plays 
an important role in many problems of functional analysis, in particular in 
the theory of generalized functions (to be discussed in the next section). 
Obviously, a sequence {/„} of functionals f n e E* is weakly convergent to a 
functional / e E* if and only if {/„(*)} converges to / (x) for every x e E. 

For weakly convergent sequences of functionals, we have the following 
analogues of Theorems 2 and 3: 

Theorem 2'. Let {/„} be a weakly convergent sequence of continuous 
linear functionals on a Banach space E. Then {/„} is bounded, i.e ., there is 
a constant C such that 

||/J<C (h=1,2,...). 

Proof. The proof is the exact analogue of that of Theorem 2. Note 
that this time we must specify that £ is a complete normed linear space 
(i.e., a Banach space). 1 

Theorem 3'. A bounded sequence {/„} of continuous linear functionals 
on a Banach space E is weakly convergent to a functional f e E* iff n (x) -*■ 
f(x) for every x e A, where A is any set whose linear hull is everywhere 
dense in E. 

Proof. The exact analogue of the proof of Theorem 3.1 

Example. Let E be the space C [0i6] of all functions continuous on [a, b ], 
and consider the functional 

\(x) = x(t 0 ), (5) 

as in Example 3 above. For simplicity (and without loss of generality), we 
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assume that t 0 = 0 e (a, b), so that (5) becomes 

*»(*) = x(0). (6) 

Let {/„(/)} be a sequence of functions continuous on [a, b] such that 10 

1 1 

1 ) fffi) is positive if |t| < - and zero if |f| > - , 

n n 

2 ) I b fjf) dt — 1 for all n — 1, 2,..., 

Ja 

and let 

K n \x) = \ b m X (t) n. 

J a 

Then 8^ w) is a continuous linear functional on (recall Example 4, 

p. 179). Moreover, given any function x(t) e C [aJj] , we have 

ScT’O) = \ b fn(t)x(t) dt = J 1 ' " f n (t)x(t) dt = x(t)J’ 1/ " fffi) dt = x(t) 

J a J—l/n J—l/n 

for some t e [— l/n, l/n], by the mean value theorem for integrals, and hence 

So n> (x) -*■ x(0) = § 0 (x) (7) 

as n -»• co. Thus the sequence of functionals {S/,” 1 } converges weakly to the 
functional §„■ Suppose we write (6) in the form 

§o(x) = f 6 §(0*(0 dt, 
da 

in terms of the “delta function” 8(t), as in Example 3, p. 124. Then, loosely 
speaking, (7) says that “the generalized function Sftl is the weak limit of the 
sequence of ordinary functions/„(t).” 

20.4. The weak* topology. There are two ways of regarding the space E* 
of continuous linear functionals on a given space E, either as the space 
conjugate to the original space E, or else as an “original space” in its own 
right, with conjugate space E**. Correspondingly, there are two ways of 
introducing a weak topology into E*, either by using neighborhoods of the 
form (4'), or else by using the values of functionals in E** on the space E*, 
as in Sec. 20.1. Clearly, the two topologies will be the same if and only if 
E is reflexive (why?). Suppose E is nonreflexive. Then, to avoid confusion, 
the weak topology determined in E* with the aid of E** will be called simply 
the weak topology, while the topology determined in E* with the aid of E 


10 As an exercise, give an explicit example of such a sequence {/„(7}}. 













202 LINEAR FUNCTIONALS 


THE WEAK TOPOLOGY AND WEAK CONVERGENCE 203 


CHAP. 5 

will be called the weak* topology , n Clearly, the weak* topology in E* is 
weaker than the weak topology in E*, i.e., the weak* topology has fewer 
open sets than the weak topology. Note that weak convergence as defined 
in Sec. 20.3 now means weak* convergence. 

The following theorem is important in various applications of the 
concept of weak convergence of functionals: 

Theorem 4. Every bounded sequence {/„} of functionals in the space E* 
conjugate to a separable normed linear space E contains a weakly* conver¬ 
gent subsequence. 

Proof. Since E is separable, there is a countable set of points 
* 1 , x 2 ,. . ., x n ,. . . everywhere dense in E. Suppose the sequence {/„} 
of functionals in E*, i.e., continuous linear functionals on E, is bounded 
(in norm). Then the numerical sequence 

/l(*l),/*(*!), • • ■ ,fn(xl), ■ ■ ■ 

is bounded, and hence, by the Bolzano-Weierstrass theorem (see p. 101 ), 
{/„} contains a subsequence 

pi) fd) ft i) 

J 1 2 9 * • • 9J n 9 • * • 

such that the numerical sequence 

fiXxJ.fiXxJ . fn\xi), ■ ■ ■ 

converges. By the same token, the subsequence {/£>} in turn contains a 
subsequence 

f( 2 ) f( 2 ) f (2) 

J 1 >/ 2 >•••>/*>••• 

such that the sequence 

/< 2 , (x 2 ),/< 2 >(x 2 ), . . . ,/<?>(x 2 ),. . . 

converges. Continuing this construction, we get a system of subse¬ 
quences {ff}, k = 1 , 2 ,... such that 

1 ) {fn +1) } is a subsequence of {/“*} for all k = 1 , 2 ,. .. ; 

2 ) {f { n} converges at the points x lt x 2 . . 

Hence, taking the “diagonal sequence” 

/•(l) 2 ) fin) 

J 1 ’/ 2 >•••>/« > ■ ■ ■ , 

we get a sequence of continuous linear functionals on E such that 

/i u (*J,/i 2 W, • • • 


11 Read “weak*” as “weak star.” 
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converges for all n. But then, by Theorem 3', the sequence 

f? ) (x),f ( 2 > (x),. . . 

converges for all x e E. [ 

Corollary 1. Every bounded set in the space E* conjugate to a 
separable normed linear space E is relatively countably compact in the 
weak * topology. 

Proof. An immediate consequence of Theorem 4 and the meaning of 
relative countable compactness (see Sec. 10.4). j 

Corollary 2. A subset of the space E* conjugate to a separable 
Banach space E is bounded if and only if it is relatively countably compact 
in the weak* topology. 

Proof. An immediate consequence of Theorem 2' and Corollary 

1 . 1 

As we will see in a moment, the word “countably” is superfluous in 
Corollaries 1 and 2. First we need 

Theorem 5. Given a separable normed linear space E, let S be the 
closed unit sphere in E and S* the closed unit sphere in the conjugate space 
E*. Then the topology induced in S* by the weak* topology in E* is the 
same as that induced by the metric 

p(/,g)=| 2 -”l(/-g, x n )\, 

«=1 

where {xj,.. . , x n ,. . .} is any countable set everywhere dense in S. 

Proof. Clearly, p (/, g) has all the properties of a metric, and moreover 
is invariant under shifts, in the sense that 

P(/+ h,g + h)= p(/, g). 

Hence we need only verify that 

1) Every “open sphere” 

& = {/:p(/, 0 )< £ } 

contains the intersection of S* with some weak neighborhood of 
zero in E*; 

2) Every weak neighborhood of zero in E* contains the intersection 
of S with some Q e . 














204 LINEAR FUNCTIONALS 


THE WEAK TOPOLOGY AND WEAK CONVERGENCE 205 


CHAP. 5 


SEC. 20 


Let N be such that 2~ N < e/2, and consider the weak neighborhood of 


zero 


^ ' ^*1,. . ..XN-.lIZ j/ : [(/> Xj)| <2 J • • • > !(/, 20^)1 < ^j. 

Then f e S* n U implies 


p(/,0) = 22 -«|(/,xJ|+ 2 2-H(/,xJ| 

n=l u=iV+l 


< ; 22 -”+ 2 2 -<e, 

Z w=l w=iVH-l 

and hence S* n U <=■ Q,. This proves 1). 

To prove 2), this time let 

u = u n,...,v m -,s = {/:|(/,Ti)l < S,... , \(f,y m )\ < 8 } 

be any weak neighborhood of zero in £*, where it can clearly be assumed 
that Hjill < 1,... , ||j m [| < 1. Since {x x ,... , x n ,. . :} is everywhere 
dense in S, there are indices n u ... ,n m such that 


H-ffc-xJIC- (k = 1 ,. .., m). 


Let 


N = max {n x ,. . ., n m }, s = 


Then f e S* C\ Q z implies 


2iv+i 


and hence 
in particular 


I 2“ M U xJ! < E 

W = 1 

|(/, xJI < 2*s, 
\(f, x J| < 2 "*e < 2 ^e = 


Therefore / e 5* n Q t implies 


I (f> < I (f + K/’ y>c — x n t )\ < | + ll/ll II Tt — x n k II < S, 

so that S* n Q % c U. g 

We can now drop the word “countably” in Corollaries 1 and 2: 

Corollary 1'. Every bounded set in the space E* conjugate to a separ¬ 
able normed linear space E is relatively compact in the weak* topology. 


Proof. Use Theorem 5 and the fact that compactness and countable 
compactness are equivalent concepts in a metric space (see Sec. 11.2.). g 

Corollary 2'. A subset of the space E* conjugate to a separable 
Banach space E is bounded if and only if it is relatively compact in the weak* 
topology. 

Proof. Identical with that of Corollary 1'. | 

Finally we prove 

Theorem 6. Every closed sphere in the space (E*, b ) conjugate to a 
separable normed linear space E is compact in the weak* topology. 

Proof. Every closed sphere in the space (E*, b) is closed in the weak* 
topology. In fact, since a shift in E* carries every closed set (in the 
weak* topology) into another closed set, we need only prove the assertion 
for every sphere of the form 

£„ = {/: II/II < e). 

Suppose f Q $ S c . Then, by the definition of the norm of the functional 
f 0 , there is an element xe E such that ||x|[ = 1 and 

/o(x) = a > c. 

But then the set 

U = {/:/(x) > *(« + c)} 

is a weak* neighborhood of / 0 containing no elements of S c . Therefore 
S c is closed in the weak* topology, and hence compact in the weak* 
topology, by Corollary 1'. 1 

Remark. Theorem 6 is a special case of the following more general 
theorem, which will not be proved here: Every bounded subset of the space 
(£*, b) conjugate to a locally convex topological linear space E is relatively 
compact in the weak* topology. 

Problem 1. Given a topological linear space E, suppose E has sufficiently 
many continuous linear functionals. Prove that E is a Hausdorff space, when 
equipped with the weak topology. 

Problem 2. Let {xj be a sequence of elements in a Hilbert space H such 
that 

1 ) {x„} converges weakly to an element xe H; 

2) ilxj — ||x|| as co. 

Prove that {x„} converges strongly to x, i.e., \\x n — x|| -+ 0 as n -*■ co. 
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Problem 3. Prove that the conclusion of the preceding problem remains 
valid if the condition 2) is replaced by either of the following conditions: 

2 ') ||xj < ||x|| for all n ; 

2 ") lim ||xj < ||x||. 

n~* oo 

Problem 4. Let H be a (separable) Hilbert space and M a bounded subset 
of H. Prove that the topology in M induced by the weak topology in H can 
be specified by a metric. 

Problem 5. Prove that every closed convex subset of a Hilbert space H 
is closed in the weak topology (so that, in particular, every closed linear 
subspace of H is weakly closed). Give an example of a closed set in H which 
is not weakly closed. 

Problem 6. Show that the two conditions in Example 3, p. 199 are 
sufficient as well as necessary for weak convergence of a sequence {x M (0} in 
C [a 6] . Give an example of a weakly convergent sequence in C [a h] which is 
not strongly convergent. 


21. Generalized Functions 

21.1. Preliminary remarks. The degree of generality attaching to the 
notion of “function” varies from problem to problem. Some problems 
involve continuous functions, others involve functions differentiable one or 
more times, and so on. However, there are a number of situations in which 
the classical notion of a function turns out to be inadequate, even when 
understood in the most general sense (i.e., as an arbitrary rule / assigning a 
number /(x) to each element x in the domain of definition of/). Here are 
two such cases: 

1) A linear mass distribution can be conveniently characterized by giving 
the density of the distribution. However, no “ordinary” function can 
specify the density corresponding to one or more points with positive 
mass. 

2) In many problems, situations arise in which various mathematical 
operations cannot be carried out. For example, a function with no 
derivative (at certain, possibly all, points) cannot be differentiated if 
the derivative is interpreted in the usual way, as an “ordinary” 
function. Of course, such difficulties can be avoided without relin¬ 
quishing classical definitions, by suitably restricting the class of 
“admissible functions,” for example, by considering only analytic 
functions. However, restricting the class of admissible functions in 
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this way is often quite undesirable. Fortunately, it turns out that 
difficulties of this kind can be overcome, and just as successfully at 
that, by enlarging (rather than restricting) the class of admissible 
functions, i.e., by introducing the notion of a “generalized function,” 
not encountered in classical analysis. In doing so, a key role will be 
played by the concept of a conjugate space, considered earlier in this 
chapter. 

Remark. It cannot be emphasized too strongly that the introduction of 
generalized functions is motivated by the need to solve perfectly concrete 
problems of analysis, and not merely by a desire to see how far the notion 
of function can be pushed. 

Before going into details, we indicate the basic idea behind the theory 
of generalized functions. Let/be a fixed function on the real line, integrable 
on every finite interval, and let cp be any continuous function vanishing outside 
some finite interval (such a function cp is said to be finite 12 ). Suppose each 
9 is assigned the number 

(/ <P) = f°° /0)?(*) dx, (1) 

—00 

involving the given function /, where the integration is in effect only over a 
finite interval, because of the finiteness of cp. In other words, the function 
/can be regarded as a functional (a linear functional, because of the basic 
properties of the integral) defined on some space K of finite functions. 
However, there are many other linear functionals on K besides functionals 
of the form (1). For example, by assigning each function cp its value at the 
point x = 0, we get a linear functional which cannot be represented in the 
form (1). In this sense, the functions / can be regarded as part of a much 
larger set, namely the set of all possible linear functionals on K. The space 
K of “test functions” cp can be chosen in various ways. For example, K 
might consist of all continuous finite functions, as above. However, as will 
soon be apparent, it makes sense to require the test functions ro satisfy rather 
stringent smoothness conditions (besides being continuous and finite). 

21.2. The test space and test functions. Generalized functions. Turning 
now to details, let K be the set of all finite functions cp on (—co,co) with 
continuous derivatives of all orders (equivalently, the set of all infinitely 
differentiable functions), where every function cp e K, being finite, vanishes 
outside some interval depending on the choice of cp. Clearly K is a linear 


12 Do not confuse the notion of a finite function (which vanishes outside some finite 
interval) with the notion of a bounded function (whose range is contained in some finite 
interval). Finite functions are often called “functions of finite (or compact) support.” 
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space, when equipped with the usual operations of addition of functions and 
multiplication of functions by numbers. Although the space K is not 
normable, there is a natural way of introducing the notion of convergence in K: 

Definition 1. A sequence {cp„} of functions in K is said to converge to 
a function 9 6 K if 

1 ) There exists an interval outside which all the functions cp n vanish ; 

2) The sequence {ff} of derivatives of order k converges uniformly 
on this interval to o ik] for every k = 0, 1 , 2, ... . 13 

The linear space K equipped with this notion of convergence is called the 
test space {or fundamental space), and the functions in K are called test 
functions (or fundamental functions). 

Definition 2. Every continuous linear functional T{f) on the test 
space K is called a generalized function on (— 00, 00), where continuity of 
T('f) means that o„ —*■ 9 in K implies T(<p n ) -*■ T( 9). 

Let f(x) be a locally integrable function, i.e., a function integrable on 
every finite interval. Then f(x) generates a generalized function via the 
expression 

7 }(<p) = (/, <p) = f" f(x)c?(x) dx, (2) 

d — CO 

which is clearly a continuous linear functional on K. Generalized functions 
of this type will be called regular, and all other generalized functions, i.e., 
those not representable in the form (2), will be called singular. The following 
are all examples of singular generalized functions: 

Example 1. The “delta function” 

n<p) = 9(0) (3) 

is a continuous linear functional on K, i.e., a generalized function in the 
sense of Definition 2. This functional can be written in the form 

T(<p) = /” S(x)cp(x) dx, ( 4 ) 

where 8 (x) is a “fictitious” function , 14 equal to zero everywhere except at 
x = 0 and such that 

f°° S(x) dx = 1 

d — GO 

13 As always, ff 1 = cp„, tp*°* = 9 . 

14 The term “delta function” will be applied to both the generalized function T(f) and 
the fictitious function S(x) generating T(tp) via the representation (4). 
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(these properties are of course paradoxical), since then we have, purely 
formally, 

T(cp) = f" §(x)cp(x) dx = <p(0) f* 8(x) dx — 9(0). 

d — 00 d — CO 

The advantage of regarding the delta function as a functional on the test 
space K rather than on the space C, a bl as in Example 3, p. 124 will soon 
be apparent. 

Example 2. Generalizing (3) and (4), we can write the functional 

n 9) = 9 (a) (3') 

in the form 


T{ <p) = | §(x — a)(p{x) dx, 

d — 00 


(4') 


in terms of the “shifted delta function” §(x — a). 


21.3. Operations on generalized functions. Addition of generalized func¬ 
tions and multiplication of generalized functions by numbers are defined 
in the same way as for linear functionals in general, i.e., by the obvious 
analogue of Definition 1 , p. 183 (with 9 and K playing the roles of x and E). 
In the case of regular generalized functions, these are just the operations 
associated with the corresponding operations for “ordinary” functions. More 
exactly, if 

T f( 9) = f{x)<p{x) dx, T s (cp) = g(x)9(x) dx, 

where / and g are locally integrable and 9 e K, then clearly 
(T f + r,)( 9 ) = T f { 9) + r,( 9 ) = T f+g { 9) 

and 

(xT f )(<p) = <x.T f ( 9) = T a/ (9) 

for any number a. 

Definition 3. A sequence of generalized functions {Tf is said to con¬ 
verge to a generalized function T if TJ/o) -> T(o) for every 9 e K. The 
space of generalized functions equipped with this notion of convergence 
is denoted by K*. 

Remark. In other words, convergence of generalized functions is just 
weak* convergence of continuous linear functionals on K. 

We will often denote a generalized function by the symbol /, as if a 
representation of the form 

(/, 9) = f °° fix) 9 (x) dx 

d — O0 


( 5 ) 
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existed, even in the case where the generalized function is singular. Let/be 
a regular generalized function, and let a = ot(x) be an infinitely differentiable 
“ordinary” function. Then (5) implies 

(*/. 9 ) = f” a(x)/(x)< fix) dx 

J — 00 

= I ” /(*)*(*) )<?(x) dx = (/, a<p), 

00 

where a 9 obviously belongs to K. Carrying this over to the singular case, 
we get 

Definition 4. The product c/f of an infinitely differentiable function a 
and a generalized function f is the functional defined by the formula 

(«/» 9 ) = (/, a<p). ( 6 ) 

Remark. It follows from ( 6 ) that the functional <xf is linear and continuous, 
and hence itself a generalized function. 

Again let T be a regular generalized function of the form 

T (?) =/” co /W c pW dx J ( 5 ') 

and suppose the derivative /' exists and is locally integrable. Then it is 
natural to define the derivative of T as the functional 

( 9 ) = /_” /'(*) ?(*) dx. (7) 

Integrating (7) by parts and using the fact that every test function 9 vanishes 
outside some finite interval, we find at once that 

% (9) = “/_* /(*) ?'(*) dx, ( 8 ) 

thereby obtaining an expression for dT/dx which does not involve the deri¬ 
vative of/. Carrying this over to the singular case, we get 

Definition 5. The derivative dT/dx of a generalized function T is the 
functional defined by the formula 

dT / x 

-(9) = -T(9). (9 ) 

Remark 1. The functional (9) is obviously linear and continuous, and 
hence itself a generalized function. Second, third and higher-order derivatives 
are defined in the same way. 
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Remark 2. If a generalized function is denoted by the symbol/, as in ( 6 ), 
then its derivative is denoted by/', and (9) takes the form 

if', 9) = -if 9')- (9') 

It is an immediate consequence of Definition 5 that 

1) Every generalized function has derivatives of all orders; 

2) If a sequence of generalized functions {/„} converges to a generalized 
function / (in the sense of Definition 3), then the sequence of deri¬ 
vatives {ff} converges to the derivative/' of the limit function . 16 


Example 1. If / is a regular generalized function whose derivative exists 
and is locally integrable (in particular, continuous or piecewise continuous), 
then the derivative of/as a generalized function coincides with its derivative 
in the ordinary sense. In fact, integrating ( 8 ) by parts, we get back (7). 

Example 2. As in Example 1, p. 208, consider the delta function 

T( 9 ) = J_^S(x) 9 (x) dx. 

It follows from Definition 5 that 

( 9 ) = -J" s ( x )?'(x) dx = — 9 '( 0 ). 

dx J ~ x 

Example 3. Consider the “step function” 



defining the linear functional 


if x < 0 , 
if x > 0 , 


( 10 ) 


T ( 9) = /”_/(*)?(*) dx = J 0 °° 9( x ) dx. 


It follows from Definition 5 that 


( 9 ) = -JjVOO dx = 9 ( 0 ), 

since 9 vanishes at infinity. Hence the derivative of (10) is just the delta 
function 8 (x). 


21.4. Differential equations and generalized functions. The development 
of the theory of generalized functions was to a large extent motivated by 

15 Equivalently, every convergent series of generalized functions can be differentiated 
term by term any number of times. 
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problems involving differential equations, particularly partial differential 
equations. We now discuss a few simple ideas concerning generalized 
functions and ordinary differential equations. The application of generalized 
functions to partial differential equations is a subject lying beyond the scope 
of this book . 16 

Lemma 1. A test function <p 0 can be represented as the derivative of 
another test function cp x if and only if 

J_°^9 0 (x) dx = 0 . ( 11 ) 

Proof. If tp 0 (x) = o’ L (x), where qq is a test function, then 

CO . . 00 

tp 0 (x) dx = 9 i(x) = 0. 

-CO I—00 

Conversely, 

<PiO) = 9o(0 dt 

is an infinitely differentiable function, with derivative 9 0 (x), and in fact 
a finite function if ( 11 ) holds, since then <p 0 and cp x vanish outside the 
same interval, g 

Lemma 2. Let cp 1 be a fixed test function such that 

/_>(*) dx = 1. (12) 

Then an arbitrary test function <p can be represented in the form 

9 = 9o + c 9i> 

where c is a constant and <p 0 is a test function which is the derivative of 
another test function. 

Proof. Let 

c = <p(x) dx, ffl„(x) = <p(v) — 9 x (a) °° 9 (x) dx. 

Then 

J " 9 0 (x) dx = 0 , 

and the proof follows from Lemma 1 . j 


16 See e.g., A. Friedman, Generalized Functions and Partial Differential Equations, 
Prentice-Hall, Inc., Englewood Cliffs, N.J. (1963). A key role in the development of the 
theory of generalized functions was played by the pioneer work of L. Schwartz, Theorie 
des Distributions, Hermann et Cie., Paris, Volume 1 (1957), Volume 2 (1959). 


Theorem 1. Every solution of the differential equation 

/ = 0 (13) 

(in the space K* of generalized functions) is a constant. 

Proof. Equation (13) means that 

(/, 9> == Oh -?') = 0 (14) 

for every 9 e K. This determines the value of the functional 

(. y , 9) = J ” y <P(*) dx 

for every function in the space K’ <=■ K of all test functions which are 
derivatives of other test functions. In fact, 

(y, 9o) = 0 

for every 9 0 e K'. Let 9 be an arbitrary test function. By Lemma 2, 
9 = 9o ~f~ C 91 , where 9 0 e K' and is a fixed test function satisfying 
the condition (12). We are free to give ( y, 9 ^ any value at all, without 
violating (14). Let 

(y, 9 j) = a = const. 

Then 

O', 9 ) = O', ?o + ctpi) = (y, 9o) + c(y, 9 i) = ac = const, 

and moreover y satisfies the differential equation (13). In fact, 9 e K 
implies — 9 ' £ K' and hence 

(/, 9 ) = O', -9') = °- i 

Corollary. If two generalized functions f and g have the same deriva¬ 
tive, then f = g + const. 

Proof. Obvious, since (/— g)' = 0. 1 

Theorem 2. Given any generalized function f there is another 
generalized function y satisfying the differential equation 

y =/(*)• (i5) 

Proof. Any generalized function satisfying (15) is called an anti- 
derivative of/. Equation (15) means that 

(/, 9) = O', -9') = (/, 9) = {f, J^ 9'(0 dtj ( 16 ) 

for every 9 e K. This determines the value of the functional (y, 9 ) for 
every function in the space K' <= K of all test functions which are 
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derivatives of other test functions. In fact, 

(y, 9 o) = [f, -/%»(*) dtj 

for every tp 0 e K'. Let 9 be an arbitrary test function. By Lemma 2, 

9 = 9o + C9i, where 9 0 6 K' and 9 X is a fixed test function satisfying 
(12). We are free to give (y, tpj any value at all, without violating (16). 
Let 

(y, 9 j) = a = const. 

Then y satisfies the differential equation (15). In fact, cp e K implies 
— 9 ' E K' and hence 

(/, 9 ) = (y , - 9 ') = {f, J %'(0 dt\ = (f 9 ). I 

Corollary. Any two antiderivatives of a generalized function f differ 
only by a constant. 

Proof Obvious by construction or from the corollary to Theorem 

1 . 1 

21.5. Further developments. We now sketch some of the many extensions 
and modifications of the notion of generalized functions. 

a) Generalized functions of several variables. Let K n be the set of all 
functions <p(x lr . . . , x n ) of n variables with partial derivatives of all orders 
with respect to all arguments, such that every 9 e K n vanishes outside some 
parallelepiped 

< x { < b t (i = 1,... ,ri) (17) 

in «-space. Then K n is a linear space, with addition of functions and multi¬ 
plication of functions by numbers defined in the usual way. We introduce 
convergence in K n by the natural generalization of Definition 1, i.e., a 
sequence { 9 *} of functions in K n is said to converge to a function 9 e K n if 

1) There exists a parallelepiped (17) outside which all the functions <p k 
vanish; 

2) The sequence of partial derivatives 

— 1 (w.= r 

dx?--.dx?f [A 1 

converges uniformly on this parallelepiped to the partial derivative 

d r (p 

dxf ■ ■ ■ dxf 

for all r, oq, . . . , a„. 


Every continuous linear functional on K n is then called a generalized function 
of n variables , and moreover every “ordinary” function f(x 1 , . . . , x n ) of n 
variables integrable on every parallelepiped can be regarded as a generalized 
function, in fact the one giving rise to the functional 


(/. <P) = //(*)<P(*) dx ’ 

where 

x = {x u . . . , x n ), dx = dx ± ■ • • dx n 

and the integral is over all of n-space. Convergence of generalized functions 
is defined by the obvious analogue of Definition 3, while partial derivatives 
of generalized functions are defined by the formula 


/ df(x) 

' dx^ 1 ■ ■ ■ ’ 




It is clear that every generalized function of n variables has partial derivatives 
of all orders. 


b) Complex generalized functions. So far we have only considered real 
generalized functions. Suppose the test functions are now allowed to be 
complex-valued, but still finite and infinitely differentiable. Then every 
continuous linear functional on the corresponding test space K is called a 
complex generalized function. If (/, 9 ) is such a functional, then 

(/, 019 ) = a(/, 9 ). 

We can also consider conjugate-linear functionals on K, satisfying the 
condition (cf. p. 123) 

(/. K ?) = “(/, 9). 


where the overbar denotes the complex conjugate. If / is an “ordinary” 
complex-valued function on the line, there are two natural ways of associating 
linear functionals with f, i.e., 

(f 9)1 = /“ /0)90) dx, 

J—CO 


(/> 9)2 = J” /0)90) dx, 

d -CO 

and two natural ways of associating conjugate-linear functionals with/: 


(/. 9)3 = f°° fix) 9 (*) dx, 

d —CO 


(/» 9)4 = /“ f(x)<p(x) dx. 

d— 00 
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Each of these four choices corresponds to a possible way of embedding the 
space of “ordinary” functions in the space of generalized functions. Opera¬ 
tions on complex generalized functions are defined by analogy with the real 
case. 

c) Generalized functions on the circle. Sometimes it is convenient to 
consider generalized functions defined on a bounded set. As a simple example, 
consider generalized functions on a circle C, choosing the test space K c to 
be the set of all infinitely differentiable functions on C, equipped with the 
usual operations of addition of functions and multiplication of functions by 
numbers. (Note that the test functions are now automatically finite, since C 
is bounded.) Then every continuous linear functional on K 0 is called a 
generalized function on the circle. Every “ordinary” function on C can be 
regarded as a periodic function on the line. In the same way, we regard 
every generalized function on the circle as a periodic generalized function, 
where a generalized function/ is said to be periodic, with period a, if 

(fix), <p(* - a)) = (f(x), <p(x)) 
for every test function 9 e K. 

d) Other test spaces. There are many possible choices of the test space 
other than the space of infinitely differentiable finite functions. For example, 
we can choose the test space to be the somewhat larger space S m of all 
infinitely differentiable functions which, together with all their derivatives, 
approach zero faster than any power of l/\x\. More exactly, a function 9 
belongs to S x if and only if, given any p,q — 0,1,2,... , there is a constant 
C m (depending on p, q and 9) such that 17 

\x*<? W) {x)\ <C m (- 00 < x < 00). 

A sequence {9J of functions in S x is said to converge to a function 9 e S m if 

1 ) The sequence {9^} converges uniformly to 9 (s) on every finite interval; 

2 ) The constants C m in the inequalities 

\x V 9n\x)\ < C„ 

can be chosen independently of n. 

There are somewhat fewer continuous linear functionals on S x than on K. 
For example, the function fix) = e x * corresponds to a continuous linear 
functional (f, 9) on K but not on S x . 

Remark. As the theory of generalized functions has evolved, it has 
become apparent that there is no need to commit oneself once and for all 
to any definite choice of test space. Rather it is best to choose a test space 


which is most suitable for solving the class of problems at hand. In general, 
the smaller the test space, the greater the freedom in carrying out various 
analytical operations (differentiation, passage to the limit, etc.) and the larger 
the number of continuous linear functionals on the space (why?). However, 
we must make sure not to make the test space too small, i.e., we must require 
not only that the test functions be “sufficiently smooth” but also that there be 
“sufficiently many” of them (in the sense of Problem 9) to allow us to “tell 
ordinary functions 18 apart.” 

Problem 1. In the test space K of all infinitely differentiable finite func¬ 
tions, let be the neighborhood base at zero consisting of all sets of the 
form 

t/ Yo . Yn = { 9:9 e K, |<p(jc)| < Y„(x). • • • > l<P (,,) 0)l < Y»0) for all x} 

for some positive functions Yo> • • • > Y« continuous on (— 00 , 00 ). Prove that 
the topology generated in K by leads to the same kind of convergence 
in AT as in Definition 1. 

Comment. There are other topologies in K leading to the same conver¬ 
gence. 

Problem 2. Let K be the test space of all infinitely differentiable finite 
functions, and let K m be the subspace of K consisting of all functions 9 e K 
vanishing outside the interval [— m, tn]. We can make K m into a countably 
normed space by setting 

ML = SU P l<P <W (*)l 0 = 0, 1, 2,. . .) 

0<7c<n 

(cf. Problem 12a, p. 171). Verify that the topology induced in K m by the 
system of norms ||-|| B coincides with the topology induced in K m by the 
topology of Problem 1. Verify that the convergence in K m induced in K m 
by the norms ||-|| n coincides with the convergence induced in K m by the 
convergence in Definition 1. Clearly K t <= K 2 c • • • c K m c • ■ ■ , and 

co 

K = U K m . 

m =1 

Show that a set Q <= K is bounded with respect to the topology in K if and 
only if there is an integer m such that Q is a bounded subset of the countably 
normed space K m . 

Problem 3. Let K and K m be the same as in Problem 2, and let T be a 
linear functional on K. Prove that the following four conditions are 


17 As an exercise, verify that this is the same space S x as in Problem 12b, p. 172 


18 More exactly, regular generalized functions. 
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equivalent: 

a) T is continuous with respect to the topology of the space K; 

b) T is bounded on every bounded subset Q <= K; 

c) If 6 K and o n .> 0, then T(cp J -> 0 (provided convergence of 

sequences is defined as in Definition 1); 

d) The restriction T m of the functional T to the space K m <= K is a 
continuous functional on K m for every m = 1,2,... 

Problem 4. Let 

T"(cp) = |" - <p(x) dx (18) 

0 x 

for every 9 in the test space K. Prove that T(<p) is a generalized function 
if the integral is understood in the sense of the Cauchy principal value. 

Hint. If cp vanishes outside the interval [a, b ], write 

f" i J, - jf * ~ ? (0) 4, + f * 2^ dx. 

J-co x Ja X Ja X 

Problem 5. Prove that the delta function and its derivative are singular 
generalized functions. Prove that the same is true of (18). 

Problem 6. Prove that addition of two generalized functions and 
multiplication of a generalized function by an infinitely differentiable 
function a (in particular, a constant) are continuous operations in the sense 
that /„-*/ implies /„+/„.,->•/ + /, «/• Prove that there 

is no way of similarly defining a continuous product of two generalized 
functions, unless the functions are regular, in which case the appropriate 
definition is T fg = T f T g where 

r,(9) = /" f(x)<?(x) dx, T g (cp) = f 00 g(x)<p(x) dx, 

"—CO — OQ 

Poo 

= J_ a> f(x)g(x)(p(x)dx. 

Problem 7. Let / be a piecewise continuous function on (— co, oo), 
differentiable everywhere except at the points x,, x 2 ,. . . , x„,.. . , where it 
has jumps 

f(x n + 0) -f(x„ — 0) = h n (n= 1,2,...). 

Prove that the generalized derivative of/ (i.e., the derivative off regarded as 
a generalized function) is the sum of its ordinary derivative (at the points 
where it exists) and the generalized function 

CO 

g(x) = I hj(x - x n ). 

n= 1 

Comment. Note that (g, cp) reduces to a finite sum for every test function 9 . 
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Problem 8. Find the generalized derivative of the function of period 2 tc 
equal to 

(-rz — x .- n _ 

|-if 0 < x < Tt, 


f(x) = ( 0 if x = 0 , 

7t + X ,, 


if — Tt < x < 0 


in the interval [—7t, 7t]. 


Ans. f\x) = —| + it 2 d( x — 2«7t). 

71=— CO 

Comment. The function (19) is the sum of the trigonometric series 


2 . ( 20 ) 

1 n 

Differentiating (20) term by term, we get the divergent series 

00 

2 cos MX. 

71=1 

Hence the concept of a generalized function allows us to ascribe a definite 
meaning to a series that diverges in the ordinary sense. The same can be 
done for many divergent integrals (like those encountered in quantum field 
theory and other branches of theoretical physics). 

Problem 9. Prove that the test space K of all infinitely differentiable finite 
functions has “sufficiently many” functions in the sense that, given any two 
distinct continuous functions f and/ 2 , there exists a function 9 e K such that 


/ 1 OO 9 OO dx [°° / 2 (x) 9 (x) dx. 

J —CO v —co 


Hint. Since fix) = ffx) — ffx) -f 0, there is a point x 0 such that 
f (x 0 ) ^ 0, and hence an interval [a, (3] in which /(x) does not change sign. 
Let 

jg-l/U-a) e - l/ls-p) if a < X < 3, 

9( x ) = 1 , • 

(0 otherwise. 

Then 96 K and 

f°° /(x) 9 (x) dx — f fix) 9 (x) dx # 0 . 

Comment. This result can be extended to functions more general than 
continuous functions, with the help of the concept of the Lebesgue integral 
(introduced in Sec. 29). 
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Problem 10. Consider the homogeneous system of n linear differential 
equations 

n 

y'i =J.a l ±(x)y k (i = 1,.. ., n) (21) 

in n unknowns y u ... ,y n , where the a ilc are infinitely differentiable functions. 
Prove that every solution of (21) in the class K* of generalized functions is a 
set of “ordinary” (in fact, infinitely differentiable) functions. 

Comment. This can be expressed by saying that every “generalized 
solution” of (21) is also a “classical solution.” 

Problem 11. Consider the nonhomogeneous system of n linear differential 
equations 

n 

y'i = 2 a ik (x)y k +f(x) (i — 1. n), (22) 

k— 1 

where the a ik are infinitely differentiable functions and the/; are generalized 
functions. Prove that (22) has a generalized solution, which is unique to 
within a solution of the homogeneous system (21). What happens if the/;, 
are “ordinary” functions? 

Problem 12. Interpret 

00 

f(x) — £ cos nx 

n=l 

as a periodic generalized function. 

Hint. Recall Problem 8. 

Problem 13. Show that S n becomes a countably normed space when 
equipped with the system of norms 

Ml* = 2 sup |(1 + IxlV^O)!- 

2 } +Q=n — co < x <oo 

Prove that convergence of sequences in this countably normed space is 
equivalent to convergence of sequences in S m as defined on p. 216. 
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22. Basic Concepts 

22.1. Definitions and examples. Given two topological linear spaces E and 
E k , any mapping 

y — Ax (x e E,y e Ex) 

of a subset of E (possibly E itself) into Ex is called an operator (from E to 
Ex). The operator A is said to be linear if 

A( xxx + fix 2 ) — ocAxx + (hAx 2 . 

Let D a be the set of all x e E for which A is defined. Then D A is called the 
domain {of definition) of the operator A. Although in general D A need not 
equal E, we will always assume that D A is a linear subspace of E, i.e., that 
x,y e D a implies ax + p_y e D A for all a and (3. 

The operator A is said to be continuous at the point x 0 e I) if, given any 
neighborhood V of the point y 0 = Ax 0 , there is a neighborhood U of the point 
Xq such that Ax e V for all x e U C\ D t . We say that the operator A is 
continuous if it is continuous at every point x 0 e D A . 

Remark 1. Suppose E and E x are normed linear spaces. Then it is easy 
to see that A is continuous if and only if, given any e > 0, there is a S > 0 
such that 

|| x' - x"|| < S (x', x" e D a ) 


implies 


|| Ax' — Ax "\| < s. 

221 
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Remark 2. In the case where E 1 is the real line, the concept of a linear 
operator reduces to that of a linear functional, and the definition of continuity 
reduces to that given on p. 175. As we will see below, much of the theory 
of linear functionals carries over in a straightforward way to the case of 
linear operators. 

Example 1. Given a topological linear space E, let lx = x for all x e E. 
Then / is a continuous linear operator, called the identity (or unit) operator, 
carrying each element of E into itself. 

Example 2. Let E and E 1 be arbitrary topological linear spaces, and let 
Ox = 0 for all x e E, where 0 is the zero element of the space Ey. Then O 
is a continuous linear operator, called the zero operator. 

Example 3. Suppose A is a linear operator mapping the w-dimensional 
space R m with basis e lt .. . , e m into the ^-dimensional space R n with basis 
e',. . . , e' n . If * is an arbitrary vector in R m , then 

TO 

X = ^ x i^j< 

1=1 

and hence, by the linearity of A, 

TO 

y — Ax =][x j Ae J . 
t=i 

Thus the operator A is completely determined once we know the vectors in 
R n into which A carries the basis vectors e lt . . . , e m . Suppose we expand 
each vector Ae } with respect to the basis e',. .. , e' n , obtaining 

n 

Ae i = 1 a H e i- 

4=1 

Then 

n m to n 

y = 2 yfi’i =^x j Ae J = 2 a u e 'i 

*=1 2 — 1 2=1 4=1 

and hence 

TO 

yi=1 a aXi, 

i=i 

i.e., the operator A is completely determined by the matrix ||a w || made up of 
the coefficients a H . 

Example 4. Let Hy be any subspace of a Hilbert space H, and let 
= H © H 1 be the orthogonal complement of H x , so that an arbitrary 
element he H has a unique representation of the form 

h = /?! + h 2 {hy e Hy, h 2 e Hj) 

(see Theorem 14, p. 158). Let 

Pit = hy. 
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Then P is a continuous linear operator, called a projection operator. Inter¬ 
preted geometrically, P “projects the whole space H onto the subspace H v ” 

22.2. Continuity and boundedness. A linear operator mapping E into E x 
is said to be bounded if it maps every bounded subset of E into a bounded 
subset of Ey. The operator analogue of Theorem 3, p. 176 for functionals is 
given by 

Theorem 1. A necessary condition for a linear operator A to be con¬ 
tinuous on a topological linear space E is that A be bounded. The condition 
is also sufficient if E Satisfies the first axiom of countability. 

Proof. To prove the necessity, suppose A is continuous and suppose 
there is a bounded set M in Ey whose image AM = {y:y = Ax, x e M } 
is unbounded in E v Then there is a neighborhood V of zero in Ey such 
that none of the sets 

-AM (n == 1, 2,...) 
n 

is contained in V. Hence there is a sequence {x„} of elements of M such 
that none of the elements 

- Ax n (n = 1,2,...) 
n 

belongs to V. But then the sequence 

1 
n 

converges to zero in E (recall Problem 6b, p. 170), while the sequence 

“ Ax n 

n 

fails to converge to zero in Ey, contrary to the assumption that A is 
continuous. 

As for the sufficiency, let {U n } be a countable neighborhood base at 
zero in E such that 

Uy=> U 2 =>■■■=> U„=> . 

If A fails to be continuous on E, then, by the operator analogue of 
Theorem 1, p. 175 j there is a neighborhood V of zero in Ey and a sequence 
{xj in E such that 

x n e-U n , Ax „fV (n = 1,2,...). 

11 

1 As an exercise, state and prove this analogue. 










224 LINEAR OPERATORS 


CHAP. 6 


The sequence {nx n } is bounded in E (and even converges to zero), while 
the sequence {nAx n } is unbounded in E x , since it is contained in none 
of the sets nV. But then A fails to be bounded on the bounded set 
{x x , x 2 ,... , x „,...}, contrary to hypothesis, g 

Next we consider the operator analogues of Definition 2 and Theorem 4, 
p. 177. Suppose E and E x are both normed linear spaces, so that in particular, 
E satisfies the first axiom of countability. Then, by Theorem 1, a linear 
operator A mapping E into E x is continuous if and only if it is bounded. 
But by a bounded set in a normed linear space we mean a set contained in 
some closed sphere Mil < C. Therefore a linear operator A on a normed 
linear space is bounded (and hence continuous) if and only if it is bounded 
on every closed sphere Mil < C, or equivalently on the closed unit sphere 
Mil < 1, because of the linearity of A. In other words, A is bounded if 
and only if the number 

Mil = sup M*ll (l) 

ii»ii<i 

is finite. 


Definition. Given a bounded linear operator mapping a normed linear 
space E into another normed linear space E x , the number (1), equal to the 
least upper bound of \\Ax\\ on the closed unit sphere Mil < 1, is called the 
norm of A. 


Theorem 2. The norm Mil has the following two properties'. 
II A ii _ M*ll 


Proof Clearly, 


\\A || = su P m, 

■x*n Mil 

M*ll < Mil Mil for all x e E. 


sup M*ll = sup M*ll 


(why ?). But the set of all vectors in E of norm 1 coincides with the set of 
all vectors 


(x e E, x E 0), 


and hence 


Mil = sup Mx|| = sup A 

II a; 21 =1 


(s)lb 


M*ll 

sup-- 11 

**0 Mil 


which proves (2). Moreover, since the vectors (4) all have norm 1, it 
follows from (1) that 


< Mil (xe E,xE 0), 


which implies (3) for r^O. The validity of (3) for x — 0 is obvious. 
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22.3. Sums and products of operators. Let A and B be two operators from 

one topological linear space E to another topological linear space E x . Then 

by the sum of A and B, denoted by A + B, we mean the operator assigning 

the element . . _ „ 

y — Ax + Bx e E x 

to each x e E. The domain D c of the sum C — A + B is just the intersection 
D a C\ D b of the domains of A and B. It is clear that C is linear if A and B 
are linear, and continuous if A and B are continuous. Let E and E x be normed 
linear spaces, and suppose A and B are bounded operators. Then C = A + B 
is also bounded, with norm 

lie'll < Mil + Mil, 

since, by Theorem 2 and Problem 10, 

||Cx|| = \\Ax + Ar|| < M*ll + \m\ < (Mil + MU) Mil 

for every x e E. 

Next, given three topological linear spaces E, E x and E 2 , let A be an 

operator from E to E, and B an operator from E x to E 2 . Then by the product 

of A and B, denoted by BA (in that order), we mean the operator assigning 

the element . 

z = B(Ax) e E 2 

to each x e E. The domain D c of the product C = BA consists of those 
x e D a for which Ax e D B . Again it is clear that C is linear if A and B are 
linear, and continuous if A and B are continuous. Let E, E x and E 2 be normed 
linear spaces, and suppose A and B are bounded operators. Then C — BA is 
also bounded, with norm 

IICH < Mil Mil, 

since 

||Cx|| = MOM)II < Mil M*ll < Mil.Mil Mil- 

Remark 1. Sums and products of three or more operators are defined 
in the natural way, e.g., 

CBA = C(BA) = (CB)A, 

A + B + C = A + (B + C) = (A + B) + C. 


Note that addition of operators is associative and commutative, while 
multiplication of operators is associative but in general not commutative 
(give an example where AB M BA). 

Remark 2. By the product ocA of the operator A and the number « is 
meant the operator assigning the element a.Ax to each x e E. Let SE(E, E x ) 
be the set of all continuous linear operators mapping E into E x . Then BE(E, E x ) 
is clearly a linear space when equipped with the operations of addition of 
operators and multiplication of operators by numbers. 
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Problem 1. Prove that every linear operator on a finite-dimensional space 
is automatically continuous (cf. Problem 2, p. 181). 

Problem 2. Let A be a linear operator mapping m-space R m into n-space 
R n . Prove that the image of R m , i.e., the set {y:y = Ax, x e R"‘}, has di¬ 
mension no greater than m. 

Problem 3. Let C [a 6] be the linear space of functions continuous on the 
interval a < x < b, equipped with the norm 

ll/ll = max |/(x)|. 

a^x^b 

Let K(x,y) be a fixed function of two variables, continuous on the square 
a < x < b, a < y < b, and let A be the operator defined by 

g(x) = Af(x) = \*K(x, y)f(y) dy. 

Prove that A is a continuous linear operator mapping Cla, 6] into itself. 

Problem 4. Let C® 0>6] be the space of functions continuous on [a, b], 
equipped with the norm 

ii/ii= 

and let A be the same as in the preceding problem. Prove that A is a con¬ 
tinuous linear operator mapping C* a 6] into itself. 

Problem 5. Given a fixed function cp(x) continuous on [a, b], let A be the 
mapping defined by 

g(x) = Af(x) = <p(x)/(x). 

Prove that A is a continuous linear operator on both spaces C [a>6] and C® a 6] , 
mapping each space into itself. 

Problem 6. Let C^j be the set of all continuously differentiable functions 
on [a, b], and let D be the differentiation operator, defined by 

Df{x) =/'(*) 

for all / 6 . Prove that 

a) C[^ M is a linear space; 

b) D is a linear operator mapping onto 

c) D is not continuous on C [a ; 

d) D is continuous with respect to the norm 

ll/lk = max |/(x)| + max \f'(x)\. 
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Problem 7. Let be the space of infinitely differentiable functions 
on [a, b ], equipped with the topology generated by the countable system of 
norms 

ll/IL = sup \f M (x)\ 

0<fc<n 

(cf. Problem 12a, p. 171). Prove that the differentiation operator D is a 
continuous linear operator on K [a b] , mapping K [a b] onto itself. 

Problem 8. Interpret the differentiation operator as a continuous linear 
operator on the space of all generalized functions. 

Hint. Take continuity to mean that if a sequence of generalized functions 
{/«(•*)} converges to a generalized function /(x), then {/„' (x)} converges to 
fix). 

Problem 9. Prove that 

a) The operators in Problems 3-7 and Examples 1-4, p. 222 are all 
bounded; 

b) A linear operator on a countably normed space is continuous if and 
only if it is bounded. 

Problem 10. Let A be a bounded linear operator mapping a normed 
linear space E into another normed linear space E v Suppose ||A|] is defined 
as the smallest number C such that ||4/|| < C ||/|| for all x e E. Prove that 
|| A || is the same number as in the definition on p. 224. Particularize this to 
the case of a bounded linear functional on E. 


Problem 11. Let E and E x be normed linear spaces, and let dP(E, £)) be 
the same as in Remark 2 above. Prove that 

a) Se{E, EJ is a normed linear space; 

b) If E x is complete, so is £f(E, E x ); 

c) If E x is complete, A k e£P{E, E x ) and 

2MJ < oo. 

k =1 

then the series 

co 

2 A 

converges to an operator A e J?(E, E x ) and 

co co 

IMII = 2 A <114*11. 

k= 1 7c=l 
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23. Inverse and Adjoint Operators 

23.1. The inverse operator. Invertibility. Given two topological linear 
spaces E and E 1 , let A be an operator from E to E x , with domain D A c E and 
range R A ~ {y:y = Ax, x e D A }. Then A is said to be invertible if the 
equation 

Ax — y (1) 

has a unique solution for every y e R A . If A is invertible, we can associate 
the unique solution of (1) with each y e ll A . This gives an operator, with 
domain R A , called the inverse of A and denoted by A~ l . 

Theorem 1. The inverse A- 1 of a linear operator A is itself linear. 

Proof If 

Ax 1 =y 1 , Ax 2 = y 2 , 

then 

A^yx = x lt A~% = x t , 

and hence 

= «i^i + a 2 *a- (2) 

On the other hand, 

A (a.xXx + a 2 .t 2 ) = oc xy x + a 2 y 2 , 

by the linearity of A, and hence 

A^fcnyx + x 2 y 2 ) = oci-*t + <x 2 x 2 . (3) 

Comparing (2) and (3), we get 

A^Caxyx + a 2 y 2 ) = cf^A^yx + ^ 2 A~ x y 2 ,. 1 

Lemma. IfM is an everywhere dense subset of a normed linear space E, 
then every nonzero element y e E is the sum of a series of the form 


where y k e M and 


y — yx + T 2 + • • • + y k + • • •, 


(A — 1,2,.. .). 


Proof. Since M is everywhere dense in E, given any y e E, there is an 
element y x e M such that 


11^ - Till < 


2 
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By Baire’s theorem (Theorem 3, p. 61), at least one of the sets M k , 
say M„, is dense in some (open) sphere S <= E k . Choosing a point 
Jo e S n M n , we can find numbers a and 3 (oc < (3) such that 5 contains 
the spherical layer 

P = {z:a < ||z -y 0 || < p.zeM-J. 

Shifting P so that its center coincides with the origin, we get another 
spherical layer P 0 . Some set M N is dense in P a . In fact, if zeP n M n , 
then z — y 0 eP 0 and 

M _1 (z - Jo)II < M _1 l|z + M _, Joll < ndlzll + ||JoII) 

< «(l|z - Jo|| + 2 ||j 0 1|) 

n/i . _?JL&II \ . _ .. , 2 [[Joll\ 


M 1 + 


j < n 


II z ~ Joll ( 1 + 


where the quantity 


is independent of z. Let 


„( 1 + 2 - W ) 


i + trl 


(recall footnote 4, p. 8). Then, by (4), z-y 0 eM N . Hence M N is 
dense in P„, since M n is dense in P. 

Now, given any nonzero element y e E lt we can always find a number 
A^O such that a < || Xy|| < (3, i.e., such that Ay e P 0 . Since M N is dense 
in To, there is a sequence {r\ k }, r lk e M N converging to Xy. Then {r ik /X} 
converges to y. Clearly, if yj* e M N , then rjX e M N for any X # 0. 
Therefore M N is dense in E t — {0} and hence in E l itself. It follows 
from the lemma that y is the sum of a series of the form 


y — Ji + y 2 + • • • + y k + 


where y k e M N and 


Consider the series 


with terms x k = A~ x y k e E, equal to the preimages of the elements 
y k e E v Since 
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the series (5) converges to an element xeE, where 

IMI cjhxj < 3 N II j II 2 ~ = 3 N || j ||. 

k.= 1 k- 1 2 


Since (5) is convergent and the operator A is continuous on E (being 
bounded), we can apply A term by term to (5), obtaining 

Ax = Ax x + Ax 2 + ■ • • + Ax j. + • • • = yi + y 2 + • • • + y k + ■ • ■ = y, 


which implies 
Moreover, 


* = A~ x y. 

\\A~ l y\\ = ||x|| < 3N ||y|| 


for all y ^ 0, and hence A" 1 is bounded, g 


Theorem 3. Let A Q be an invertible bounded linear operator mapping 
a Banach space E into another Banach space E 1 , and let AA be a bounded 
linear operator mapping E into E x such that 

||A^|| < —i— . (6) 

IIVII 

Then the operator 

A — Aq -|- AA 

maps E onto E k and has a bounded inverse. 

Proof. Lety be a fixed element of £), and consider the mapping B of 
the space E into itself defined by 

Bx = Afy — AfAAx. 

It follows from (6) that B is a contraction mapping. Hence, by Theorem 
1, p. 66, B has a unique fixed point jc such that 


But (7) implies 


x = Bx — Afy — AfAAx. 
Ax = A 0 x + A Ax = y. 


(7) 


Clearly, if Ax' — y, then x' is also a fixed point of B, and hence x' = x. 
Therefore, given any y e E x , the equation Ax = y has a unique solution 
in E, i.e., the operator A is invertible with inverse A 1 . Moreover, A~ x 
is bounded, by Theorem 2. g 


Theorem 4. Let Ebea Banach space , and let I be the identity operator 
on E. Suppose A is a bounded linear operator mapping E into itself, such 
that 


Mil < l. 


( 8 ) 
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Then the operator (I — A) 1 exists, is bounded and can be represented in 
the form 

oo 

(I-A)”' (9) 

Te —0 

Proof. The existence and boundedness of (/ — Af 1 follows from 
Theorem 3 (and will also emerge in the course of the proof). It follows 
from (8) that 

2 m*ii <2 mii* < oo. 

fe=0 k= 0 

But then, by the completeness of E, the sum of the series 


00 



is a bounded linear operator (see Problem 11c, p. 227). Given any n, 
we have 


(/ - A)'£A k =%A k (I — A) = I — A n 


Hence, taking the limit as n -* co and bearing in mind that 


M” +1 II < M|| n+ 1 -*o, 

we get 

(I-A) 2 A* = I, 

0 

which implies (9). 1 


23.2. The adjoint operator. Given two topological linear spaces E and 
£j, let A be a continuous linear operator mapping E into £j, and let g be a 
continuous linear functional on £j, i.e., an element of the conjugate space 
E*. Suppose we apply g to the element y — Ax, thereby obtaining a new 
functional 

fix) = g(Ax) (re E). (10) 

Clearly,/ is continuous and linear (why?), and hence an element of the 
conjugate space E*. Thus (10) associates a functional / e E* with each 
functional g e E*, i.e., (10) defines an operator mapping E* into E*. This 
operator is called the adjoint of A , and is denoted by A *. Using the symmetric 
notation (/, x) for the functional f{x), we can write (10) in the form 

(g, Ax) = (/, x). 
or 

(g. Ax) = (A*g, x). (11) 

Equation (11) can be regarded as a concise definition of the adjoint of A. 
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Example. As in Example 3, p. 222, suppose A is a linear operator with 
matrix ||« f# || mapping m-space R m into n-space R n . Then the mappings = Ax 
can be written as a system of equations 

m 

y, = 2 a u x i O’ = !»•••> n), (12) 

l 

while the functional f (x) can be written in the form 

m 

fix) = 2/,*„ 

3=1 

where f j =f(e j ) in terms of a basis e l9 ... , e m in R m . Since 

n n m m n 

fix) = g(Ax) = 2 gji = 22 gi a u x i = 2 x i 2 gi a u> 

i =1 i =1 J=1 j=l i—1 

we find that 

n 

fi 2 ®iigi’ 

i =1 

or 

f =2a iigi (13) 

i= 1 

after interchanging the roles of the indices i and j. But f = A*g, and hence 
comparing (12) and (13), we see that the matrix of the operator A* is \\a H \\, 
i.e., the transpose of the matrix of A. 


It follows at once from the definition of the adjoint of an operator that 

1) A* is linear; 

2) (A + B)* = A* + B*; 

3) (txA)* = 7.A * for arbitrary complex a. 

A somewhat less obvious property of the adjoint operator is given by 

Theorem 5. Let A be a bounded linear operator mapping a Banach 
space E into another Banach space £j, and let A* be the adjoint of A. 
Then A * is bounded and 

U* || = IM||. (14) 

Proof. By the properties of the norm of an operator, we have 


which implies 


I iA*g,x)\ = \(g, Ax)\ < ||gli MU M||, 

M*gll < Mil IlfII, 


and hence 


Ml < Mil- 


(15) 
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Suppose x e E, Ax =£ 0, and let 

Ax „ 

j»o = —— e E,, 

I Ax|| 

so that, in particular, ||y 0 || = 1. Let g be the functional such that 

gO\y 0 ) = X 

on the set L c £j of all elements of the form }.y 0 . Then clearly (g, j 0 ) = 1, 
ll^llon l = !• Using the Hahn-Banach theorem, we can extend g to a 
functional on the whole space E 1 such that \\g || = 1 and 

(g,yo) = 1, i.e., (g, Ax) = \\Ax\\. 

Therefore 

M*ll =(g,Ax) = \(A*g,x)\< \\A*g\\ \\x\\ < Ml ll^ll |x|| = \\A*\\ ||x||, 
which implies 

Mil < M*||. (16) 

Comparing (15) and (16), we get (14). | 

23.3. The adjoint operator in Hilbert space. Self-adjoint operators. Next 
we consider the case where A is a bounded linear operator mapping a (real 
or complex) Hilbert space H into itself. According to the corollary to 
Theorem 2, p. 188, the mapping t assigning the linear functional 

(L y)(x) = (x,y) 

to every y e H establishes an isomorphism between H and the conjugate 
space H*. 2 Let A* be the adjoint of the operator A. Then clearly the 
mapping A* — x~ 1 A*t is a bounded linear operator mapping H into itself, 
such that 

(Ax,y) = (x, A*y) (17) 

for all x, y e H. MoreoverM*|| = P||,since M*II = Mil and the mappings 
t and t - 1 are isometric. 

We now establish the following convention: If H is a Hilbert space, then 
by the adjoint of an operator A mapping H into H, we mean the operator 
A* defined by (17). Note that A*, like A, maps H into H. To keep the 
notation simple, we will henceforth drop the tilde, writing A* instead of 
A*. Replacing A* by A* in (17), we get 

(Ax,y) = (x, A*y) ( 17 ') 

for all x, y e H. 

2 Or a “conjugate-linear isomorphism” in the case where H is complex (see Problem 6 
p. 194). 
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Remark. It should be emphasized that this definition of A* differs from 
the definition of the adjoint of an operator A mapping an arbitrary Banach 
space E into itself, in which case A * is defined on the conjugate space E* 
rather than on the space E itself. The context will always make it clear 
whether A* is the operator defined by (11) or the operator defined by (17'). 

Let A be a bounded linear operator mapping a Hilbert space H into itself. 
Then it makes sense to ask whether or not A = A*, since A and A* are 
defined on the same space. This leads to the following 

Definition. A bounded linear operator A mapping a Hilbert space H 
into itself is said to be self-adjoint if A = A*, i.e., if 

(Ax, y) = (x. Ay) 

for all x,yeH. 

Remark. Everything said above continues to hold if we replace H by the 
real n-space R n or complex u-space C n . 

23.4. The spectrum of an operator. The resolvent. In the theory of linear 
operators and their applications, a central role is played by the notion of 
the “spectrum” of an operator. 3 Let i be a linear operator mapping a 
topological linear space E into itself. Then a number X is called an eigenvalue 
of A if the equation 

Ax = Xx 

has at least one nonzero solution, and every such solution x is called an 
eigenvector of A (corresponding to the eigenvalue X). Suppose E is finite¬ 
dimensional. Then the set of all eigenvalues of A is called the spectrum of 
A, and all other values of X are said to be regular (points). In other words, 
X is regular if and only if the operator A — XI is invertible. The operator 
( A — Xlf 1 is then automatically bounded, like every operator on a finite¬ 
dimensional space (cf. Problem 1, p. 226). Thus there are just two possibilities 
in the finite-dimensional case: 

1) The equation Ax = Xx has a nonzero solution, i.e., X is an eigenvalue 
of A, so that the operator (A — X/)“ l fails to exist; 

2) The operator (A — X/) -1 exists and is bounded, i.e., X is a regular 
point. 

However, in the case where E is infinite-dimensional, there is a third 
possibility: 

3) The operator (A — X/) -1 exists (i.e., the equation Ax = Xx has no 
nonzero solutions), but,is not bounded. 

3 In talking about the spectrum of an operator, it will always be tacitly assumed that 
the operator is defined on a complex space. 
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To describe this more general situation, we introduce some new terminology 
and make an important modification in the definition of the spectrum. 
Given an operator A mapping a (complex) topological linear space E into 
itself, the operator 

R\ — (A — XI)- 1 (18) 

is called the resolvent of A. The values of X for which R^ is defined for all 
E and continuous are said to be regular (points ) of A, and the set of all other 
values ot X is called the spectrum of A. The eigenvalues of A still belong to 
the spectrum, since if (A - XI)x = 0 for some x 0, then (18) fails to exist. 
The set of all these eigenvalues is now called the point spectrum, and the rest 
of the spectrum is called the continuous spectrum. In other words, the con¬ 
tinuous spectrum consists of all X for which (18) exists but fails to be 
continuous. Thus there are now exactly three possibilities for any given value 
ofX: 

1) X is a regular point; 

2) X is an eigenvalue; 

3) X is a point of the continuous spectrum. 

The possibility of an operator having a continuous spectrum is a character¬ 
istic feature of the theory of operators in infinite-dimensional spaces, dis¬ 
tinguishing it from the finite-dimensional case. 

Theorem 6 . Let A be a linear operator mapping a Banach space E 
into itself. Then the set A of all regular points of A is open (equivalently , 
the complement of A is closed). 

Proof. If X is regular, the operator (A - X/) 1 exists and is bounded. 
Hence, for sufficiently small 8, the operator (A — (X + S)/)- 1 also exists 
and is bounded, by Theorem 3. In other words, the point X + 8 is reg¬ 
ular for sufficiently small 8. g 

Theorem 7. If A is a bounded linear operator mapping a Banach space 
E into itself and if |X| > \\A\\, then X is a regular point. In other words, 
the spectrum of A is contained in the disk of radius Mil with center at the 
origin. 

Proof. Obviously 

and 

-if. 

If MU < X, then M/X|| < 1, and hence R, exists and is bounded, by 
Theorem 4. § 


Example 1. In the space C = C [0>1] , consider the operator A defined by 
Ax(t) = y.(t)x(t), 

where p(t) is a fixed function continuous on [0, 1], Then 
(A - XI)x(t) = (p(t) - X)x(t), 

and 

(A - X/) _1 x(t) = 1 x(t). 

p(f) — X 

Hence the spectrum of A consists of all X such that p(t) — X vanishes for 
some t in the interval [0, 1], i.e., the spectrum is the range of the function 
(i(t). 

Example 2. Suppose f i(t) — t in the preceding example. Then the spec¬ 
trum is just the interval [0, 1]. On the other hand, there are obviously no 
eigenvalues. Thus the operator A defined by 

Ax(t) = tx(t) 

is an example of an operator with a purely continuous spectrum. 

Finally, for self-adjoint operators in a Hilbert space, we have the following 
analogue of a well-known result for finite-dimensional Euclidean spaces 
(proved in exactly the same way): 

Theorem 8 . Let A be a self-adjoint operator mapping a ( complex ) 
Hilbert space H into itself. Then all the eigenvalues of A are real, and two 
eigenvectors of A corresponding to distinct eigenvalues are orthogonal. 

Proof. If 

Ax = Xx (x 0), 

then 

X(x, x) = (Ax, x) — (x, Ax) = (x, Xx) = X(x, x), 
and hence X = X. Moreover, if 

Ax = Xx, Ay = py (X jk p), 

then 

X(x,y) = (Ax, y) = (x, Ay) = (x, ay) = p(x,y) = p(x,y), 
and hence 

(x,y) = 0, 

i.e., the vectors x and y are orthogonal, g 

Problem 1. Given two normed linear spaces E and £j, a linear operator 
A from E to £j, with domain D A , is said to be closed if x„ € D A , x n —>■ x, 
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■Ax n -+y implies x e D A , Ax — y. Prove that every bounded operator is 
closed. 

Problem 2. Let E and E 1 be normed linear spaces, with norms ||-|| and 
II'|li> respectively. By the direct (or Cartesian) product of E and E lt denoted 
by E x E lt we mean the set of all ordered pairs (x, y), xeE,y e E x . Prove 
that E X E x is a normed linear space when equipped with the norm 

||(x,y)|| = ||*|| + Hjlli 

(addition of elements and multiplication of elements by numbers being defined 
in the obvious way). By the graph of a linear operator A from E to E x we 
mean the subset of E x E x equal to 

g a = {(*> y) -x e D a , y = Ax). 

Prove that 

a) G a is a linear subspace of E x E x ; 

b) G a is closed if and only if the operator A is closed; 

c) If E and E x are Banach spaces and if A is closed and defined for all 
x eE, so that D A = E, then A is bounded (this is Banach’s closed 
graph theorem). 

Hint. In c) apply Theorem 2 to the projection operator P carrying each 
ordered pair (x. Ax) e G A into the element x e E. 

Problem 3. Prove that if A is an invertible continuous linear operator 
mapping a complete countably normed space E into another complete 
countably normed space E x , then the inverse operator A~ x is itself continuous. 
State and prove the closed graph theorem for countably normed spaces. 

Problem 4. Let A be a continuous linear operator mapping a Banach 
space E onto another Banach space E x . Prove that there is a constant a > 0 
such if B e EE ( E , E x ) and || A — B\\ < a, then B also maps E onto (all of) E x . 

Problem 5. Let A be an operator mapping a Hilbert space H into itself. 
Then a subspace M H is said to be invariant under A if x e M implies 
AxeM. Prove that if M is invariant under A, then its orthogonal com¬ 
plement M' — H © M is invariant under the adjoint operator A* (in 
particular, under A itself if A is self-adjoint). 

Problem 6. Let A and B be bounded linear operators mapping a complex 
Hilbert space H into itself. Prove that 

a) (cut + (LB)* = xA* + $B*; 

b) (AB)* = B*A*; 

c) L4*)* = A; 

d) I* — /, where / is the identity operator. 
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Problem 7. Give an example of an operator whose spectrum consists of 
a single point. 

Problem 8. Given a bounded linear operator A mapping a Banach space 
E into itself, prove that the limit 

r = lim VjACW 

n~> oo 

exists. Show that the spectrum of A is contained in the disk of radius r 
with center at the origin. 

Comment. The quantity r is called the spectral radius of the operator A. 
This result contains Theorem 8 as a special case, since \\A n \\ < MU". 

Problem 9. Let B x = (A — \T)~ X and = (A — p/) _1 be the resolvents 
corresponding to the points X and p. Prove that = R,,R A and 

K -R x = (\l- x)*A- O 9 ) 

Hint. Multiply both sides of (19) by (A — II) (A — \il). 

Comment. It follows from (19) that if X 0 is a regular point of A, then 
the derivative of B x with respect to X at the point X 0 , i.e., the limit 

lim ~ ■ R h 

ax-*o AX 

(in the sense of convergence with respect to the operator norm) exists and 
equals By. 

Problem 10. Let A be a bounded self-adjoint operator mapping a complex 
Hilbert space H into itself. Prove that the spectrum of A is a closed bounded 
subset of the real line. 

Problem 11. Prove that every bounded linear operator defined on a com¬ 
plex Banach space with at least one nonzero element has a nonempty 
spectrum. 

24. Completely Continuous Operators 

24.1. Definitions and examples. We now discuss a class of operators which 
closely resemble operators acting in a finite-dimensional space and at the 
same time are very important from the standpoint of applications: 

Definition. A linear operator A mapping a Banach space E into 
itself is said to be completely continuous if it maps every bounded set into 
a relatively compact set. 
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Remark 1. If E is finite-dimensional, then every linear operator A 
mapping E into E is completely continuous. In fact, A maps bounded sets 
into bounded sets (recall Problem 1, p. 226) and hence maps bounded sets 
into relatively compact sets (why?). 

Remark 2. In an infinite-dimensional space, complete continuity of an 
operator is a stronger requirement than merely being continuous (i.e., 
bounded). For example, the identity operator in an infinite-dimensional 
space is continuous but not completely continuous (see Example 1 below). 

Lemma. Let x x , x 2 ,. . . be linearly independent vectors in a normed 
linear Space E, and let E n be the subspace generated by the vectors 
x x ,... , x„. Then there are vectorsy x , y 2 ,. .. such that y n e E„, ||yj = 1 
and 4 

T«) = inf ||x - y n \\ > J. 

xeE n -1 

Proof. Since the vectors x x , x 2 , ... are linearly independent, we have 
x n $ E n _ x and hence 

p(E n - 1 , x n ) = a > 0 

(recall Problem 5a, p. 141). Let x* be a vector in E n _ x such that 



Example 1. The identity operator I in an infinite-dimensional Banach 
space E is not completely continuous. In fact, we need only show that the 
closed unit sphere S in E (which is obviously carried into itself by 7) is not 
compact. This follows at once from the lemma, since S contains a sequence 
of vectors y x ,y 2 ,... such that 

p (y 71 —x > y n ) > b 

and such a sequence clearly cannot contain a convergent subsequence. 

Example 2. Let A be a continuous linear operator on an infinite-dimen¬ 
sional Banach space E, where A is “degenerate” in the sense that it maps 
E into a finite-dimensional subspace of E. Then A is completely continuous. 


4 The quantity p(£„_i, y n ) is, of course, just the distance between the set E n _ x and the 
point y„ (cf. Problem 9, p. 54). 
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since it maps every bounded subset M <= E into a bounded subset of a 
finite-dimensional space, and hence into a relatively compact set. 


Turning to the space of functions continuous on the interval [a, b], 
we now establish conditions under which the “integral operator” A defined 
by 

+(x) = (A<y)(x) = f*K(x, y)cp(y) dy (1) 

is completely continuous. 

Theorem 1 . Suppose the kernel K(x, y) is such that 

1) K(x, y) is bounded on the square a < x <, b, a < y < b\ 

2) The discontinuities {if any) of K(x, y) all lie on a finite number of 
curves 

7 =/*(*) {k= !,...,«), 

where the functions f k are continuous. 

Then (1) is a completely continuous operator mapping C l°. b] into w 

Proof. First we note that the conditions 1) and 2) guarantee the 
existence of the integral (1) for every x £ [a, b ], so that tj/(x) is defined 
on [a, b]. Let R be the square a<x<b,a<y<b, and let 

M = sup | K(x, y)|. (2) 

( x,v)eR 

Moreover, let G be the set of all points (x, y) e R such that 


It -fk 0)1 < 


£ 

12 Mn 


for at least one integer k = 1,, .. , n, and let F = R — G. Since F is 
compact (why ?) and K(x, y) is continuous on F, given any s > 0, there is 
a S > 0 such that 

\K(x\ y)~ K(x",y)\<--—— (3) 

3 (b — a) 

for any two points (x',y), (x", y) e F satisfying the condition 

|x' — x"\ < S 

(recall Theorem 1, p. 109). 

Now suppose (4) holds. Then 


(4) 
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To estimate the integral on the right, we divide the interval a < y < b 
into the set 

\y-f k (x')\ < —M U U \y: \y ~f k {x")\ < —M 
*-11 12 Mn) *=1 ( 12Mn) 

and the complementary set Q = [a, b] — P. Using (2) and noting that 
P is a union of intervals of total length no greater than e/3M, we have 


I K(x',y) - K(x", y)| |cp(y)| dy < ^ II©II, 


where, as usual, 


|| 91| = sup |cp(y)|. 


On the other hand, it follows from (3) and (4) that 

i\K(x', y) - K(x", y)| |<p(y)| dy < ? ||<p||. (7) 

J Q 3 

Comparing (5)—(7), we find that (4) implies 

I'!'(*') - <K*")I < s ll?l|. (B) 

In particular, ^ is continuous on [a, b], so that the operator A defined by 
(1) actually maps the space C [a 6] into itself. Moreover, it follows from 
(8) and from the estimate. 

Ill'll = sup |*Kx)| < sup p|A(x, y)| |cp(y)| dy < M(b - a) [|<p|| 

a<x<b Ja 

that A carries any (uniformly) bounded set of functions <D c: C [a t/] into 
a (uniformly) bounded equicontinuous set V F <=. C [aM (recall Definitions 
3 and 4, p. 102). But then T is relatively compact, by Arzela’s theorem 
(Theorem 4, p. 102), and hence A is completely continuous. j 

Remark 1. The requirement that the discontinuities of the kernel K(x,y ) 
lie on a finite number of curves, each intersecting the lines x = const in a 
single point, is essential. For example, let K(x,y) be the function 


K(x, y) 


defined on the square 0 < x < 1, 0 < y < 1. Then K(x, y) is discontinuous 
at every point of the line segment x = 0 < y < 1, and the operator (1) 

with this kernel maps the function x(t) = 1 into a discontinuous function. 

Remark 2. If K(x,y) = 0 fory > x, then (1) takes the form 


iRx) = (Ay){x) = f^X(x, y)cp(y) dy. 

da 
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Suppose K(x,y) is continuous for y < x. Then it follows from Theorem 1 
that the operator A, called a Volterra operator, is completely continuous. 


24.2. Basic properties of completely continuous operators. We begin with 

Theorem 2. Given a sequence {A n } of completely continuous operators 
mapping a Banach space E into itself, suppose {A n } converges in norm to an 
operator A, i.e., suppose ||A — AJ | ->-0 as n -* oo. Then A is itself 
completely continuous. 

Proof. To prove that A is completely continuous, we need only show 
that the sequence { Ax n } contains a convergent subsequence whenever 
the sequence { x n } of elements x n e E is bounded, i.e., such that 

IKII < M (9) 

for some M > 0 and all n = 1, 2,. . . (why is A linear?). Since A 1 is 
completely continuous, the sequence {^jX n } contains a convergent 
subsequence. In other words, there is a subsequence {x ( f} of the sequence 
{x n } such that {A^f} converges. Similarly, since A 2 is completely con¬ 
tinuous, the sequence {A 2 x ( ^} in turn contains a convergent subsequence. 
Thus there is a subsequence {xj, 2) } of the sequence {x^} such that {A^' 1 } 
converges. Then obviously {A^x^} also converges. Continuing this 
argument, we find a subsequence {xj, 3) } of the sequence {x^} such that 
{/lixjf}, {A^x™}, all converge, and so on. Consider the 

“diagonal sequence” 

v (l> v (2) v (*») 

A 1 > . . . > ■ • • 

The clearly each of the operators A lt A 2 ,. . . , A n , . . . maps this 
sequence into a convergent sequence. 

We now show that the sequence { Ax ‘j 1 ’} also converges, thereby 
completing the proof. Since the space E is complete, it is enough to show 
that {Ax'ff is a Cauchy sequence. Clearly 


+ \\A k x„ l) - A k x { Z ’! 
+ \\A k xW - ^x^' , || 


Given any s > 0, first choose k such that 


Next, using the fact that { A converges and hence is a Cauchy 
sequence, choose N such that 
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for all n, ri > N. Then it follows from (9)—(12) that 

M*n" > - Ax$' || < f + l + l = s 

for all sufficiently large n and ri, i.e., {Ax { n n) } is a Cauchy sequence. 1 

Not only is the set of completely continuous operators closed (algebra¬ 
ically) under operator multiplication, but we have the following much stronger 
result: 

Theorem 3. Let A be a completely continuous operator and B a 
bounded operator mapping a Banach space E into itself. Then the operators 
AB and BA are completely continuous. 

Proof. If the set M E is bounded, then BM — {y:y = Bx, x e M} 
is also bounded. Therefore ABM is relatively compact, and hence AB 
is completely continuous. Moreover, if M is bounded, then AM is 
relatively compact, and hence BAM is also relatively compact by the 
continuity of B, i.e., BA is completely continuous. | 
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Hence, by Arzela’s theorem (Theorem 4, p. 102), <I> is relatively compact 
in the space C US] of all continuous linear functionals on [AS]. But the 
set <E>, with the metric induced by the usual metric of C US] , is isometric 
to the set A*S*, with the metric induced by the norm of the space E*. 
In fact, if g x , g 2 e S*, then 

IM*gi - A*g 2 1| = sup \(A*g 1 - A*g 2 , x)\ = sup |(g t - g 2 , Ax )| 

xeS a-sS 

= sup|( gl - g 2 , z)| = sup |( gl - g 2 , z)I = p( gl , g 2 ). 

zeAS 26[^4/S'] 

Being relatively compact, the set O is totally bounded, by Theorem 3, 
p. 101. Therefore the set A*S* isometric to <1> is also totally bounded, 
and hence relatively compact, by the same theorem. | 

Theorem 5. Let A be a completely continuous operator mapping a 
Banach space E into itself. Then, given any p > 0, there are only finitely 
many linearly independent eigenvectors of A corresponding to eigenvalues 
of absolute value greater than p. 


Corollary. A completely continuous operator A mapping a Banach 
space E into itself cannot have a bounded inverse ifEis infinite-dimensional. 

Proof. If A- 1 were bounded, then, by Theorem 3, the identity 
operator / = A- 1 A would be completely continuous. But this is im¬ 
possible, by Example 1, p. 240. | 

Theorem 4. Let A be a completely continuous operator mapping a 
Banach space E into itself. Then the adjoint operator A * is also completely 
continuous. 

Proof. We must show that A * carries every bounded subset of the 
conjugate space E* into a relatively compact set. Since every bounded 
subset of a normed linear space is contained in some closed sphere, it 
is enough to show that A* maps every closed sphere into a relatively 
compact set. In fact, by the linearity of A*, we need only show that the 
image A*S* of the closed unit sphere S* <= £* is relatively compact. 

Now suppose we regard the elements of E* as functionals not on the 
whole space E but only on the compactum [AS] equal to the closure of 
the image of the closed unit sphere under the operator A. Then the set 
of functionals on [/l5] corresponding to those in S* is uniformly bounded 
and equicontinuous, since |[cp|| < 1 implies 

sup |cp(jc)! =sup|(p(x)| < || 9 1| sup ||Ax|| < ||A|| 

xsAS xeS 

and 




i 


i 


Proof. Given nonzero eigenvalue X of A, let E x be the subspace of E 
consisting of all eigenvectors of A corresponding to X. 5 Then E x is 
finite-dimensional, since otherwise A would fail to be completely con¬ 
tinuous in E- k and hence in E itself, by virtually the same argument as in 
Example 1, p. 240. Therefore, to complete the proof, we need only show 
that if {X„} is any sequence of distinct eigenvalues of A, then X„ -* 0 as 
n -> co. This in turn will be proved once we show that there is no infinite 
sequence {X„} of distinct eigenvalues of A such that the sequence {1/XJ 
is bounded. 

Thus, suppose there is a sequence {XJ of distinct eigenvalues of A 
such that {1/X n } is bounded, and let x„ be an eigenvector of A corre¬ 
sponding to the eigenvalue X„. Then the vectors x lt x 2 , . . . are linearly 
independent, by the same argument as in the case where E is finite¬ 
dimensional . 6 Let E„ be the subspace generated by x t ,. . . , x n , i.e., the 
set of all elements of the form 


n 

y = Hvk- 

k=i 

For every y e E n , we have 



n 


— 2 a k X k 
k= 1 


ei x. 



5 Note that E x is invariant under A in the sense that xeE x implies Ax e E A (cf. Problem 
5, p. 238). 

8 See e.g., G. E. Shilov, op. cit.. Lemma 1, p. 182. 


|cp(x') - <p(x")| < II 9II II*' - x"|| < ||*' - *"||. 
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so that 

y-yAye 

Let {y n } be a sequence such that y n e E n , \\y n \\ = 1 and 


p( £ »-i> y n ) — inf IIx - y n \\ > 

xeE n —1 

(such a sequence exists by the lemma on p. 240). Then {yj'hj is a 
bounded sequence in E, since the numerical sequence {1/X n } is bounded. 
But at the same time the sequence {A(y„/X J} cannot contain a convergent 
subsequence, contrary to the complete continuity of A, since 


A (—\ - A (~\ 
l V \v 

for all p > q, since 

1 


3h) ~ . Ay p + A 
A~ 


(?) 


> 


~ Ay v + A 

Ad 


(?) 




This contradiction proves the theorem. § 


24.3. Completely continuous operators in Hilbert space. Specializing to 
the case of completely continuous operators mapping a Hilbert space into 
itself, we have 

Theorem 6. Let A be a linear operator mapping a Hilbert space H 
into itself. Then A is completely continuous if and only if 

1) A maps every relatively compact set in the weak topology into a 
relatively compact set in the strong topology, 

2) A maps every weakly convergent sequence into a strongly convergent 
sequence. 

Proof. To prove 1), we merely note that H is the conjugate of a 
separable space, since H = H*, and hence, by Corollary 2', p. 205, a 
subset of H is bounded if and only if it is relatively compact in the weak 
topology. 

To prove 2), suppose A maps every weakly convergent sequence 
into a strongly convergent sequence, and let M be a bounded closed sub¬ 
set of H. Then M contains a weakly convergent sequence and hence AM 
contains a strongly convergent sequence, i.e., AM is relatively compact 
in the strong topology. It follows that A is completely continuous. 
Conversely, if A is completely continuous, let {xj be a weakly convergent 
sequence with weak limit x. Then {AxJ contains a strongly convergent 
subsequence. At the same time, {AxJ converges weakly to Ax, by the 


continuity of A, so that {AxJ cannot have more than one limit point. 
Therefore {AxJ is a strongly convergent sequence. § 

Let A be a self-adjoint operator in a finite-dimensional complex Euclidean 
space, and suppose A has matrix \\aj\ (recall Example 3, p. 222). Then it 
will be recalled from linear algebra that ||a w || can be reduced to diagonal 
form with respect to a suitable orthonormal basis. 7 We now generalize this 
result to the case of a completely continuous self-adjoint operator in a (real 
or complex) Hilbert space (see Theorem 7 below), after first proving two 
preliminary lemmas: 


Lemma 1. Let A be a completely continuous self-adjoint operator 
mapping a Hilbert space H into itself, and let {xj be a sequence in H 
converging weakly to x. Then 

(Ax n , x n ) -+ (Ax, x) (13) 

as n —► oo. 

Proof. Clearly, 

\{Ax n , xj - (Ax, x)\ < | (Ax„, xj - (Ax, x n )\ + \(Ax, x„) - (Ax, x)\. 

But 

I (Ax n ,xJ - (Ax, x n )\ < \\xj | \A(x n - x)\\, 

and 

\(Ax, xj - (Ax, x)| = \(x, A(x n - x))| < ||x|| \\A(x„ - x)||, 

where the numbers ||xj, n = 1,2,. .. are bounded, by Theorem 2, 
p. 196, and \\A(x n - x)|| -> 0 by Theorem 6. Therefore 

\(Ax n , xj — (Ax, x)| 0 

as n — oo, which is equivalent to (13). g 

Lemma 2. Given a bounded linear operator A mapping a Hilbert space 
H into itself, let A be self-adjoint and suppose the least upper bound of the 
functional 

\Q(x)\ = \(Ax, x)| 

on the closed unit sphere ||x|| < 1 is achieved at the point x = x 0 . Then 

(x 0 ,j) = 0 (14) 

implies 

(Ax 0 ,y) = (x 0 , Ay) = 0. 

In particular, x 0 is an eigenvector of A. 


7 See e.g., V. I. Smirnov, Linear Algebra and Group Theory (translated by R. A. 
Silverman), McGraw-Hill Book Co., New York (1961), Sec. 40. Dover reprint (19/0). 
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Proof. Obviously, 


Let 


Ml = i. 


x o ~t~ a y 

Vl + |a| 2 ||y|| 2 ’ 


(15) 


where a is an arbitrary complex number. Then ||x|| = 1, because of 
(14) and (15). Since 


Q( x ) = ||2 KW + 2 Re a(Ax 0 , y ) + \a\ 2 Q(y)\, 

1 + |a| II Til 

we have 

Q(x) = g(x 0 ) + 2 Re a(Ax 0 , y) + 0(|a| 2 ) (16) 

for small \a\. But it is clear from (16) that if (Ax 0 , y) # 0, then a can be 
chosen to make \Q(x)\ > \Q(x 0 )\, contrary to the assumption that the 
least upper bound of \Q(x)\ on the closed unit sphere is achieved at the 
point x = x a . Therefore (Ax 0 ,y) = 0 as asserted, i.e., A is orthogonal 
to every vector orthogonal to x 0 . It follows that Ax 0 and x 0 are pro¬ 
portional (why?), so that x 0 is an eigenvector of A. | 


Theorem 7 ( Hilbert - Schmidt ). Let A be a completely continuous self- 
adjoint operator mapping a Hilbert space H into itself. Then there is an 
orthonormal system <p x , <p 2 , . . . of eigenvectors of A, with corresponding 
nonzero eigenvalues \ 1 ,\,... , such that every element xe H has a unique 
representation of the form s 

* = 2 C n cp„ + x\ (17) 

where x satisfies the condition Ax' = 0. Moreover 

Ax = J > X n c n c p n , (18) 

and 

lim \ n — 0 

n-* co 

in the case where there are infinitely many nonzero eigenvalues. 

Proof. Let 

Mj = sup | (Ax, x)|, 

ll*ll<J 

xeH 

and let {x n } be a sequence of elements of H such that ||xj = 1 and 

I (Ax„, x n )\ -> M 1 

as n -* oo. Since the closed unit sphere in H is weakly compact (recall 


Corollary 2', p. 205), we can find a subsequence of {x n } which converges 
weakly to an element y e H, where clearly ||j]| < 1. By Lemma 1, 

\(Ay,y)\ = M u 

and hence, by Lemma 2, y is an eigenvector of A. Moreover ||j|| = 1, 
since if |j j|| < 1, then choosing 



we would have ||_y'|| = 1 and 

\(Ay’,y')\ > M lt 

contrary to the meaning of M x . We choose y as our first eigenvector y x . 
Let Xj be the corresponding eigenvalue, so that 

A (p x = Xxq)!. 

Then 

N = I(^?1, 901 = M x . 

Next let E x be the subspace of H consisting of all vectors of the form 
t/.(pi, and let E[ — H © E x be the orthogonal complement of E x . Clearly 
E( is again a Hilbert space, mapped into itself by the operator A (this 
follows from Problem 5, p. 238 and the fact that A is self-adjoint). Let 


M 2 = sup | (Ax, x)|. (19) 

ll«n<i 

xeEi 


Then, by the same argument as before, we can find an eigenvector cp 2 of 
A such that <p 2 e E[, || o 2 \\ = L Let X 2 be the corresponding eigenvalue, 
so that 


Then 
and hence 

since H E' x implies 


A tp 2 = X 2 <p 2 . 


|X 2 | = \(A(p 2 , q> 2 )| = M 2 , 
I Xi | > |X 2 |, 


M x = sup \(Ax, x)| > sup |(^4x, x)| — M 2 . 

II * [| 

xsH xeEi 

By its very construction, cp 2 is orthogonal to <p x . 

To construct further eigenvectors of A, we argue inductively, re¬ 
placing (19) by 


8 As will appear in the course of the proof, the sums in (17) and (18) may be finite or 
infinite, and x' may vanish. 


M n+1 = sup \(Ax, x)| (n = l, 2, . . .), 

IMI«i 

%eE n 
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where E' n = H © E n is the orthogonal complement of the subspace E n 
generated by the previously constructed eigenvectors <p l5 <p 2 , . . . , <p n ” 
Then E' n is again a Hilbert space mapped into itself by A, and there is an 
eigenvector (? n+1 eE' n of unit norm, with corresponding eigenvalue 
satisfying the inequality 

W > K+il (w = 1,2,...). 

In this way, we construct an orthonormal system {<p„} of eigenvectors of A. 
There are now just two possibilities, which we examine in turn: 

Case 1. Suppose the construction of the sequence {<p„} terminates after 
a finite number of steps, i.e., suppose there is a positive integer n 0 such 
that (Ax, x) = 0 on E' rl(j . Then it follows from Lemma 2 that A maps 
the whole space E' no into the zero vector. According to Theorem 14, 
p. 158, every element xe H has a unique representation of the form 

x = h x', 

where h e E n ^ x' e E' Uq , and hence of the form 

x = 2 c n < p n + X ', 

where the sum is finite (consisting of n 0 terms) and Ax' = 0. Obviously 
we have 

Ax = 2 Kc n < P«. 

thereby completing the proof in this case. 

Case 2. Suppose the construction of the sequence {<p„} never termi¬ 
nates, i.e., suppose (Ax, x)=£ 0 on E’ n for all n — 1,2,_We then 

have infinitely many nonzero eigenvalues A x , A 2 , ... , X n .Clearly 

K — 0 as n — 00 . In fact, the sequence {cpj converges weakly to zero, 
like any sequence of orthonormal vectors (why?), and hence the se¬ 
quence {A(p„} converges to zero in norm, so that \\At pj 0 and hence 
ll*»<PJ = |A„I — 0 . Let E x be the subspace of //generated by all the 
eigenvectors < p lt cp 2 , ... , <p„,.. . , i.e., the set of all linear combinations 
of the form 

00 

n =1 

and let 

<30 

K> = H © E„ = D E' n . 

n~ 1 

If E' m = {0}, then H = E x and x obviously has a representation of the 
form (17) with x' = 0 (so that Ax' = 0 trivially). If E'„ ^ 0, let x be any 
nonzero element of E' m . Then 

\(Ax,x)\ < |A„| ||x|| 2 
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for all n = 1, 2,... , and hence (Ax, x) = 0 on E' m . It follows from 
Lemma 2 that A maps the whole space E’ m into the zero vector. The rest 
of the proof is the same as in Case 1, where (18) follows from (17) by 
the continuity of A. | 

Corollary. Let A be a completely continuous self-adjoint operator 
mapping a Hilbert space H into itself. Then there is an orthonormal 
system {4<„} of eigenvectors of A such that every element x e H has a unique 
representation of the form 

X = f C ni l n■ 

n— 1 

Moreover 

00 

Ax == 2 I/ni, 

71=1 

where X 2 ,. . . are the eigenvalues corresponding to <J/ 1( tjj 2 ,.... 

Proof. Noting that every element of E' n 01 E' m is an eigenvector of A 
corresponding to the eigenvalue A = 0, let {<]>„} consist of the ortho¬ 
normal system {cp n } constructed in the proof of Theorem 7, together 
with an arbitrary orthonormal basis in E' n<s or E' m . g 

Problem 1. Prove that the projection operator of Example 4, p. 222 is 
completely continuous if and only if the subspace H x is finite-dimensional. 

Problem 2. Prove that the operator A mapping the point 

x = fo, x 2 . x„,...)e / 2 

into the point 

Ax=(x 1 ,| 2 ,...,^,...)e/ 2 

is completely continuous. More generally, suppose 

Ax = (a^, c 2 x 2 ,... , a n x n ,...). 

Under what conditions on the sequence {«„} is A completely continuous ? 

Hint. Since every bounded set in 4 is contained in some closed sphere, 
it is enough to show that the images of spheres are relatively compact. In 
fact, by the linearity of A, it need only be shown that the image of the unit 
sphere is compact. In this regard, recall Example 5, p. 98. 

Problem 3. Let A be the integral operator on C[_ l l;] defined by 
<Rx) = (Af)(x) =/_* i ?(y) dy. 

Prove that A maps the closed unit sphere in C t _ 1>X] into a noncompact set. 
Reconcile this with Theorem 1. 
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Hint. Let 


?»(*) = 


-1 < x < 0, 

A 1 

0 < x < - , 


if - < x < 1. 
n 


'0 

if 

— 1 < x < 0, 

1 2 
- nx 

if 

A 1 

0 < x < - , 

2 


n 



1 

^ 1 

if 

- < x < 1. 


Then cp„ e Cr_ JiX i, ||<p n || = 1 for all n, and 


'■KM = (A<? n )(x) = 


The sequence converges in to the function 

(0 if — 1 < x < 0, 

<K*) = 

[x if 0 <x < 1, 

which, having a discontinuous derivative, cannot be the image under A of 
any function in 

Problem 4. Let i be a completely continuous operator mapping a 
reflexive Banach space E (e.g. a Hilbert space) into itself. Prove that A maps 
the closed unit sphere in E into a compact set. Reconcile this with the pre¬ 
ceding problem. 

Hint. Use Theorem 6, p. 205. 

Problem 5. Prove that 

a) A linear combination of completely continuous operators is itself a 
completely continuous operator; 

b) The set ^(E, E) of all completely continuous operators mapping a 
Banach space E into itself is a closed subspace of the linear space 
SP{E, E) of all bounded linear operators mapping E into E. 

Problem 6. Let &CE, E ) and St f(E, E) be the same as in the preceding 
problem. Prove that besides being a linear space, SP(E, E) is also a ring 
when equipped with the usual operations of addition and multiplication of 
operators. Prove that r P(E, E) is a two-sided ideal in SP(E, E). 

Comment. By a two-sided ideal in a ring 8% is meant a subring si <= ^ 
such that a e si, r e 01 implies ar e si, rae si. 
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Problem 7. Let fl> and A *S* be the same as in the proof of Theorem 4. 
Show that <£> is closed and hence compact. Deduce from this that A *S* is 
compact, even though as shown in Problem 3, the image of the closed unit 
sphere under a completely continuous operator need not be compact. 

Problem 8. Discuss the connection between Theorem 4 and the theory of 
Sec. 20.4, in particular Corollary V, p. 204. 

Problem 9. Let A be a bounded linear operator mapping a Banach space 
E into itself. Show that if A* is completely continuous, then so is A. 

Problem 10. Prove that a linear operator A mapping a Hilbert space H 
into itself is completely continuous if and only if its adjoint (in the sense 
of Sec. 23.3) is completely continuous. 

Problem 11. Give an example of a completely continuous operator A 
mapping a Hilbert space H into itself, such that A has no eigenvectors. 
Reconcile this with Theorem 7. 

Hint. Let A be the operator in 4 such that 


Ax = A(x„ x 2 , x 3 , . . . , x„,.. .) = |o, % 1( ~ , . 


Then Ax = Xx implies 


Xx x = 0, Xx a = x r ,\x 3 = , Xx„ = 


and hence x = 0. 



x„-1 



Comment. This situation differs from the finite-dimensional case, where 
every linear operator (self-adjoint or not) has at least one eigenvector. 
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The concept of the measure 'J-(E) of a set E is a natural generalization of 
such concepts as 

1) The length /(A) of a line segment A; 

2) The area A(F ) of a plane figure F; 

3) The volume V(G) of a space figure G; 

4) The increment cp (b) — <p(a) of a nondecreasing function 9 (t) over a 
half-open interval [a, b); 

5) The integral of a nonnegative function over a set on the line or over 
a region in the plane or in space. 

Although the notion of measure first arose in the theory of functions of a 
real variable, it was subsequently used extensively in functional analysis, 
probability theory, the theory of dynamical systems, and other branches 
of mathematics. In Sec. 25 we discuss the measure of plane sets, starting 
from the notion of the area of a rectangle. Measure in general will then 
be studied in Secs. 26 and 27. The reader will easily confirm that the con¬ 
siderations in Sec. 25 are of a general nature and carry over to the case of 
the more abstract theory without essential changes. 


25. Measure in the Plane 

25.1. Measure of elementary sets. Consider the system if of sets in the 
•xy-plane, each defined by one of the inequalities 

a < a- < b, a < x < b, a < x <b, a <x <b 


and one of the inequalities 

c < y < d, c < y < d, c < y < d, c <y < d, 

where a, b, c and d are arbitrary real numbers. The sets in if will be called 
rectangles. The closed rectangle defined by the inequalities 

a < x < b, c < y < d 

is a rectangle in the usual sense (including its boundary) if a < b and c < d, 
a line segment (including its end points) if a — b and c < d or if a < b and 
c — d, a point if a — b, c = d, or even the empty set if a > b or c > d. The 
open rectangle 

a < x < b, c <y < d 

is either a rectangle in the usual sense (without its boundary) if a < b and 
c < d or the empty set if a > b or c > d. Each of the rectangles of the 
remaining types will be called half-open and is an ordinary rectangle minus 
one, two or three sides, a line segment minus one or two end points, or 
possibly the empty set. 

In keeping with the concept of area familiar from elementary geometry, 
we now define the measure of each set in if as follows: 

1) The measure of the empty set equals 0; 

2) The measure of the nonempty rectangle (closed, open or half-open) 
specified by the numbers a, b, c, and d equals 

(b — a){d — c). 

Thus with each rectangle P e if we associate a number m(P), called its 
measure, where clearly 

1) m{P) is real and nonnegative; 

2) m(P) is additive in the sense that if 

n 

P — U Pic , Pic Cl Pj = 0 

lc= 1 

then 

m(P) = 2 m(P k ). 

k =1 

Our problem is to define the concept of measure for sets more general than 
rectangles, while preserving these two properties. The first step in this 
direction is to define measure for elementary sets, where by an elementary 
set we mean any set which can be represented in at least one way as a union 
of a finite number of pairwise disjoint rectangles. First we prove 

Theorem 1. The union, intersection, difference and symmetric 
difference of two elementary sets are again elementary sets. 


254 
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Proof. If 

A = U P k , B=Ue, 

k l 

are two elementary sets, then clearly 

A n b = U (P k n qo 

k,l 

is also an elementary set, since each n g, is obviously either a 
rectangle or the empty set. Moreover, it is easy to see that the difference 
of two rectangles is an elementary set. Hence, subtracting an elementary 
set from a rectangle gives another elementary set (as an intersection of 
elementary sets). Suppose A and B are elementary sets, and let P be a 
rectangle containing both of them (such a rectangle obviously exists). 

It follows from what has just been proved that 

A Kj B = P — [(P — A) n (P — B)] 

is an elementary set. It is then an easy consequence of the formulas 

A - B = A O (P - B), 

A A B = (A u B) - (A n B) 

that the difference and symmetric difference of two elementary sets is 
again an elementary set. 1 

Remark. In other words, the system of all elementary sets is a ring Si, 
as defined on p. 31. 

We now define measure for elementary sets: 

Definition 1. Given an elementary set A, suppose 
A = U P k , 

k 

where the P k are pairwise disjoint rectangles. Then by the measure of A, 
denoted by m(A), is meant the number 

m(A) = 2 m(P k ), ( 1 ) 

k 

where m(P k ) is the measure of the rectangle P. 

Remark. Clearly, m(A) is nonnegative and additive. Moreover, in defining 
th(A), we have tacitly relied on the fact that the sum (1) does not depend on 
how A is represented as a union of sets. To verify this, suppose 

A=UP k =UQ l , 

k l 

where P k and Q, are rectangles such that 

P { n Pj = 0 , e f ng,.= 0 (i ^ j). 


Since the intersection P k n g, of two rectangles is itself a rectangle, it follows 
from the additivity of the measure of rectangles that 

2 m (Pk) = 2 m(p k n Qi) = 2 m (2*)- 

k k,l l 

Theorem 2 . If A is an elementary set and {A n } is a finite or countable 
system of elementary sets such that 


then 


U A. 


m(A) < %m(A n ). 


( 2 ) 


Proof. Given any s > 0, there is a closed elementary set A contained 

in A and satisfying the condition 

— £ 
m(A) > m(A) — - . 

In fact, to get A we need only replace each of the k rectangles P t making 
up A by a closed rectangle contained in P } of area no less than 


m(P j) 


2k 


Moreover, for each A„ there is clearly an open elementary set A n contain¬ 
ing A n and satisfying the condition 


Obviously, 


m(A n ) < m(A n ) + 


A <= U A„. 


2 n+1 


Hence, by the Heine-Borel theorem (recall p. 92), there is a finite 
system A ni ,... , A„ t covering A, where 

S 

m{A) < 2 ™(A ni ), 

i =1 

since otherwise A would be covered by a finite number of rectangles of 
total area less than m(A), which is impossible. Therefore 

m(A) < m(A ) + ; < 2 ™(A ni ) + ; < 2 ™( A n) + ^ 

2 2=1 In I 


< 2 MA n ) + 2 + ; = 2 m(A n ) + «. 

n n I An 


which implies (2), since s > 0 is arbitrary. | 
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25.2. Lebesgue measure of plane sets. Elementary sets are, of course, far 
from being the most general plane sets considered in geometry and analysis. 
Thus we naturally arrive at the problem of extending the concept of measure 
(while preserving its basic properties) to sets more general than finite unions 
of rectangles with sides parallel to the coordinate axes. This problem is 
solved in a definitive way by Lebesgue’s theory of measure, in which we 
consider countably infinite unions of rectangles, as well as finite unions. 
To avoid sets of “infinite measure,” we restrict our discussion to subsets 
of the closed unit square E, defined by the inequalities 


0 < x < 1, 0 < < 1 

(this restriction is dropped in Remarks 2 and 3, p. 267). 

Definition 2. By the outer measure of a set A <= E is meant the 
number 

f-%4) = inf 2 m{P k ), 

^ctlPi k 
* 

where the greatest lower bound is taken over all coverings of A by a finite 
or countable system of rectangles P k . 

Definition 3. By the inner measure of a set A c E is meant the 
number 


fi*G4) == 1 - y.*(E- A). 
Theorem 3. The inequality 

[ ^(A) < n*(A) 

holds for any set A c: E. 

Proof Suppose 


i.e„ 


p*G4) > n*(A), 


(x*(A) + n*(E-A)-< 1. 

Then, by the definition of a greatest lower bound, there are systems of 
rectangles {Pfi and {Q k } covering A and E — A, respectively, such that 

2 tn(P j) + 2 m(Qk) < L 

i k 

Let {RJ denote the union of the systems {PA and {Q k }. Then 


E = U R t , 

while 

m(E) > 2 

contrary to Theorem 2. | 


Definition 4. A set A is said to be ( Lebesgue ) measurable if 

^(A) = | m*(A), 

i.e., if its inner and outer measures coincide. 

Definition 5. If a set A is measurable, the number y.(A) equal to the 
common value of ix*(A) and [i*(A ) is called the ( Lebesgue ) measure of A. 

For outer measure, we have the following analogue of Theorem 2: 

Theorem 4. If A is any set and {A n j is a finite or countable system of 
sets such that 

4 c Uf„ 

n 

then 

<AA) < 2 (2') 

n 

Proof. Given any s > 0, for each A n there is a finite or countable 
system of rectangles { P nk } such that 

A n C U Pnk 

V 

and 

2 m(P nk ) < p*04„) + ~ , 

k 

by the definition of outer measure. Then 

A c U U P nk 

n k 

and 

nV) <12, m ( p nk) < 2 F*(^J + e, 

n k n 

which implies (2'), since s > 0 is arbitrary. 1 

Corollary. If A is any measurable set and {A n } is a finite or count¬ 
able system of measurable sets such that 

A <= U A n , 

n 

then 

\>-(A) < 2 [4AJ. ( 2 ") 

n 

Proof. Merely replace p* by p in (2'). 1 

Next we show that the Lebesgue measure of an elementary set coincides 
with its measure as previously defined: 

Theorem 5. Every elementary set A c: E is measurable, with Lebesgue 
measure p(^4) equal to the measure m(A) introduced in Definition 1. 
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Proof. Suppose A is the union of the pairwise disjoint rectangles 
P 1 ,...,P k . Then 

Jc 

m(A) = 2 m(P } ), 

3=1 

by Definition 1. Therefore, since the rectangles P 1 ,...,P k obviously 
cover A, 

1**04) < J m(Pj) = (3) 

i 

by Definition 2. Moreover, if {2,} is any finite or countable system of 
rectangles covering A, we have 

m(A) < J m(Qi) 

j 

by Theorem 2, and hence 

m(A) < f i*(A), (4) 

by Definition 2 again. Comparing (3) and (4), we get 

m(A) = p.*(^4). 

Now E — A is also an elementary set, and hence 


m(E — A) — n*(E — A). 
m(E — A) — l — m(A), 


It follows that 


and hence 


H*(E-A) = 1 - p*G4). 
m(A) = p*G4), 


m(A) = ^(A) = p*(^). | 
Corollary. Theorem 2 w a special case of Theorem 4. 
Proof. Merely replace p* by m in (2') or p by m in (2"). 
Lemma. The inequality 

|fi*(M) — p.*(S)| < [i*(A A B) 
holds for any two sets A and B. 

Proof. Since 

A <= Bkj (A A B) 
it follows from Theorem 4 that 

[ jl*(A) < [x*(B) + ;x*(A A B). 
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This implies (5) if [x*(A) > \x*(B). If |x*(T) < \x*(B), we deduce (5) 
from the inequality 

y.*(B) < [x*(A) + [x*(-4 A B) 

obtained by interchanging the roles of A and B in (6). 1 

Theorem 6. A set A is measurable if and only if, given any s > 0, 
there is an elementary set B such that 

(i*(A A B) < s. (7) 

Proof. Suppose that given any s > 0, there is an elementary set B 
such that (7) holds. Then, by the lemma, 

- fx*CB)l = l(**G4) - m{B)\ < s, (8) 

and similarly 

|p*(£ - A) - m(E - B) | < e, (9) 

since 

(E-A)a(E-B) = AaB. 

Bearing in mind that 

m(B) + m(E — B) = m(E) — 1, 
we deduce from (8) and (9) that 

||x*(/l) — p.*(2s — A) — 1| < 2e, 

and hence that 

y.*(A) + [x*(E - A) = 1, (10) 

since s > 0 is arbitrary. But then [x^(A) = [x*(A), so that A is 
measurable. 

Conversely, suppose A is measurable, i.e., suppose (10) holds. Then, 
given any e > 0, there are systems of rectangles {B n } and {€„} covering 
A and E — A, respectively, such that 

+ - ( 11 ) 

n 3 

1 m(CJ < p*(.E - A) + J • 

n 3 

Moreover, since 2 m(B n ) < go, there is an N such that 

n 

1 m (B„) < f ■ 

n>N 3 


( 12 ) 
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We now show that (7) holds for the elementary set 


N 

B = US„. 


Clearly, the set 


f=UB, 


contains A — B, while the set 


Q = U (B n c n ) 


contains B — A, and hence 


Moreover, 


A A B c p u Q. 


|x*CP) < V m(B n ) < - 
n>N 3 


To estimate p.*(g), we note that 


and hence 


(u U (U (C„ - = E, 

2 m (B n ) + 2 m(C n - B) > 1. 


But (11) and (12) imply 

2 m(B n ) + 2 m(CJ < p%4) + F*(£ - A) + ^ = 1 + ^ . (16) 

w n 3 3 

Subtracting (15) from (16), we get 

2 m{C n ) - 2«(C„ - B) = 2 m(C n n B) < ^ , 

n n n 3 


y-*(Q) < f ■ (17) 

Finally, comparing (13), (16) and (17), we find that 

V*(A AB)< p.*(P U 0 < p*(P) + [**(0 < e. | 

Theorem 7. The union and intersection of a finite number of measurable 
sets are again measurable sets. 

Proof It is enough to prove the theorem for two sets. Thus suppose 
Ay and A 2 are measurable sets. Then, by Theorem 6, there are elementary 
sets By and B 2 such that 

F Vi A Bfi < |, p%4 2 A B 2 ) < ^ . 
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Since 

(Ay U A 2 ) A (By U B 2 ) <= (A 1 A B x ) U (T 2 A B 2 ), 

we have 

p*[C4i U A 2 ) A (Pj U B 2 )] < fifiAy A Pj) + fi*(A 2 A 5 2 ) < e. 

But By U B>, is an elementary set, and hence Ay U A 2 is measurable, by 
Theorem 6 again. Moreover, a set A is measurable if and only if 

(a*G4)+ h*(E-A)= 1, 

and hence if A is measurable, so is E — A. Therefore the measurability 
of Ay n A 2 follows from that of Ay U A 2 and the formula 

Ay n A 2 = E — [(£■- Ay) U(E- A 2 )]. 1 

Corollary. The difference and symmetric difference of two measur¬ 
able sets are again measurable sets. 

Proof. An immediate consequence of Theorem 7 and the formulas 
Ay — A 2 = Ay Cl (E A 2 ), 

Ay A A 2 = (Ay - A 2 ) U (A 2 - Ay). I 
Theorem 8. If Ay,. .., A N are pairwise disjoint measurable sets, then 
/ N \ N 

p( U/l„) =2K^J- 

\n =1 / n=l 

Proof. As in the proof of Theorem 7, we need only consider the case 
n — 2. By Theorem 6, given any e > 0, there are elementary sets By 
and B 2 such that 


Let 


y.*(Ay A By) < e, p*(A 2 A B 2 ) < s. 
A = Ay U At, B = ByKJ B 2 . 


(18) 


Then A is measurable, by Theorem 7. Since Ay and A 2 are disjoint, we 
have 

By Cl B 2 <= (Ay A By) U (A 2 A B 2 ), 

and hence 

m(By r Bo) < 2s. (19) 

Moreover, it follows from (18) and the lemma on p. 260 that 

| m(By) - p*^)! < e, \m(B 2 ) - p*(A 2 )| < s. (20) 

Since measure is additive on elementary sets, it follows from (19) and 
(20) that 

m(B) = m(By) + m(B 2 ) - m(By n B 2 ) > p.*^) + p*(A 2 ) - 4s. 
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Noting also that 

A A B <= (A t A BJ U (^ 2 A 5 2 ), 

we have 

( x*(A) > w(i?) — y-*(A A -5) > m(i?) — 2s > p.*^) + — 6s. 

Therefore 

fx*O0 > + p*(zt 2 ), (21) 

since s > 0 can be made arbitrarily small. On the other hand, it follows 
from A = A 1 u A 2 and Theorem 4 that 

{*•(>4) < + [aW ( 22 ) 

Comparing (21) and (22), we get 

p*(4> = p*^) + p*(^ 2 ), 

where p* can be replaced by p, since A lt A 2 , and A are measurable, g 

Theorem 9. The union and intersection of a countable number of 
measurable sets are again measurable sets. 

Proof Given a countable system of measurable sets { A n }, let 
A = U A n , 

n= 1 

and let 

n—1 

a; = A u A' n = A n -U A k (n = 2, 3,. . .). 

k=l 

Then the sets A' n are pairwise disjoint, and 

A = UA' n . 

n= 1 

By Theorem 7 and its corollary, the sets A' n are all measurable. More¬ 
over, by Theorems 4 and 8, 

5>0Q = (*■( U A' n ) < p.*(T) 

n—1 \ n~1 } 

for every tV = 1,2,... . Therefore the series 

00 

2 V-(A’ n ) 

n =1 

converges, and hence, given any s > 0, there is an integer v > 0 such that 

2 v-( A 'n) < ■ 

n> v 2 


(23) 
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Since the set 

C= UA' n 

n =1 

is measurable, being the union of a finite number of measurable sets, 
there is an elementary set B such that 

p*(C A B) < - . (24) 

Moreover, since 

A A B c (C A B) u ( U , 

\n>v J 

it follows from (23) and (24) that 

\i*(A A B) < e. 

Therefore /t is measurable, by Theorem 6. Finally, since complements of 
measurable sets are themselves measurable, the intersection 


n A n = E - U (E - AJ 

n=l n =1 

is measurable, g 

Theorem 9 generalizes Theorem 7 to the case of a countable number of 
measurable sets. The corresponding generalization of Theorem 8 is given by 


Theorem 10. 
sets, then 

Proof Let 

Then, since 



for every N = 1, 2.it follows from Theorem 8 and the corollary to 

Theorem 4 that 

2>04„) = p.( (iA n ) < {J.(A). 

n~l \«=1 / 

Taking the limit as N -*■ ®,we get 


(26) 
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On the other hand, since obviously 

oo 

A c U A„, 

n~ 1 

it follows from the same corollary that 

00 

V-(A) < 2 [J-(A n ). (27) 

n—1 

Comparing (26) and (27), we get 

00 

V-(A) = 2 V-(A J, 

n =1 

or equivalently (25). | 

The key property of the measure p expressed by (25) is described by 
saying that p is countably additive or a-additive. 

Theorem 11. Let {A n } be a sequence of measurable sets which is 
decreasing in the sense that 


Then 

where 


A, => A 2 =>■■■=> A n => ■■■. 
lim p (A n ) = p (A), 


(28) 


A = 11 A n . 

n —1 

Proof. We need only consider the case A — 0, to which the general 
case reduces if A n is replaced by A n — A. Clearly 

A\ — (A x — A 2 ) U (A 2 — A 3 ) U • • • , 

and 

A n (A n A n + j) C (A n _^i — A n ^_ 2 ) KJ ’ ''. 

Therefore, by the cr-additivity of p, 

CO 

p(Ti)==2m-(^-^+i) (29) 

fc=l 

and 

CO 

^4) = 2p-(4-^ 1 ). (30) 

Tc—n 


Since the series (29) converges, its remainder (30) approaches 0 as n 00 . 
It follows that 

limp(TJ = 0 = p(0). | 


( 


: 


Corollary. Let {AJ be a sequence of measurable sets which is in¬ 
creasing in the sense that 


A x c A 2 c • • • c A n c • • •. 


Then 

lim p(A„) = p(A), 

(28') 

where 




A = D A n . 



n—1 


Proof Apply Theorem 11 to the complements of the sets A„. 1 

The property of the measure p expressed by (28) and (28') is described 
by saying that p is continuous. 

Remark 1. To recapitulate, starting from a measure m defined on the 
class SP of all rectangles (with sides parallel to the coordinate axes), we 
have succeeded in extending m first to a measure m defined on the larger 
class ^ of all elementary sets and then to a Lebesgue measure p defined 
on the still larger class of all measurable sets. The class is closed 
under the operations of taking countable unions and intersections. Moreover, 
the measure p is tr-additive on SP W 

Remark 2. So far we have required all our sets to be subsets of the closed 
unit square 

E = {(x,y):0< x < 1,0 <y< 1}. 

It is easy to get rid of this restriction. For example, representing the whole 
plane as the union of the squares 

E m n = {(x, y ): m <x<m+l,n<y<n+l}, 

where m and n are arbitrary integers, we say that a plane set A is measurable 
if its intersection A mn = A n E mn with every square E mn is measurable as 
previously defined and if the series 

m,n 

converges. The measure of A is then defined as 

H(A) = 2 „). ( 31 ) 

m,n 

All the properties of measure proved above carry over to this more general 
case in a straightforward way (give the details). 

Remark 3. We might go still further, calling a set A measurable with 
“infinite measure” if every A mn is measurable and if the series (31) diverges. 
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Alternatively, we can regard the whole plane as the union of the squares 
E n = {(x,y): — n < x < n, —n < y < n}, 
calling a plane set measurable, with (possibly infinite) measure 

\l(A) = lim fx(A„) (32) 

n~*oo 

if its intersection A„ = A O E n with every square E n is measurable as 
previously defined. As an exercise, prove the consistency of (31) and (32). 

Problem 1. Let E be the closed unit square. Prove that 

a) Every open subset of E is measurable; 

b) Every closed subset of E is measurable; 

c) Every set obtained from open and closed subsets of E by forming no 
more than a countable number of unions, intersections and com¬ 
plements is measurable. 

Comment. There are measurable subsets of E which are not of the type c). 

Problem 2 . Construct a theory of Lebesgue measure for sets on the line, 
starting from intervals (closed, open and half-open) instead of rectangles. 
Do the same for 

a) Sets on the circumference of a circle; 

b) Three-dimensional sets; 

c) Sets in R n . 

Problem 3. Prove that the set of all rational points on the line is measur¬ 
able, with measure zero. 

Problem 4. Prove that the Cantor set constructed in. Example 4, p. 52 
is measurable, with measure zero. 

Problem 5. Prove that every set of positive measure in the interval [0, 1] 
contains a pair of points whose distance apart is a rational number. 

Problem 6. Show that the power of the set of all measurable subsets of 
the interval [0, 1] is greater than the power of the continuum. 

Problem 7. Let C be a circle of circumference 1, and let a be an irrational 
number. Let all points of C which can be obtained from each other by 
rotating C through an angle nan (where n is any integer, positive, negative 
or zero) be assigned to the same class. (Clearly, each such class contains 
countably many points.) Let <3> 0 be any set containing one point from each 
class. Prove that <P 0 is nonmeasurable. 
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Hint. Let <l>„ be the set obtained by rotating <5 0 through the angle nan. 
Then 

co 

c=Uf„ 

71 =—CO 

and 

° ^ (»» ^ «)• 

If <I> 0 were measurable, the congruent sets would also be measurable. 
This would imply 

| fx(<PJ = 1, (33) 

71— — CO 

by the c-additivity of fx. But congruent sets must have the same measure, 
i.e., if <5 0 were measurable, then 

A^n) = P( ( 1 ) o). 

which contradicts (33). 

26. General Measure Theory 

26.1. Measure on a semiring. In Sec. 25 we constructed a theory of 
measure of plane sets, starting from a measure (area) m defined on the class 
(f m 0 f a n rectangles (with sides parallel to the coordinate axes) and then 
extending m to a Lebesgue measure (x defined on the much larger class -Sf x 
of all measurable sets. The explicit formula for the area of a rectangle played 
no role in this construction. In fact, a moment’s thought shows that we only 
used the following properties of the set function m : 

1) The domain of definition £P m of m, i.e., the class of all rectangles, 
is a semiring; 1 

2) m is real and nonnegative; 

3) m is additive in the sense that if P is a rectangle such that 

P = U P k , 

k=l 

where P u ...,P n are pairwise disjoint rectangles, then 

m ( p ) =i>(T*)- 

k =1 

As will be shown in this section and the next, the construction given in 
Sec. 25 for the case of plane sets can be carried out in an abstract setting, 
whose very generality greatly enhances its range of applicability. 

1 We now draw freely from the material in Sec. 4, on systems of sets. 
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Guided by the above properties of m, we introduce 
Definition 1. A set function p.(H) is called a measure if 

1) The domain of definition SP of jx is a semiring ; 

2) [x is real and nonnegative ; 

3) jx is additive in the sense that if A is a set in SP^ such that 

A = U A k , 

k -1 

where A u . . . , A n are pairwise disjoint sets in then 
M = 2 n(A k ). 

fc=1 

Remark. It follows from 0 = 0 U 0 that 
jx(0) = 2[x(0), 

and hence 

(x(0) = 0. 

Theorem 1. Let a he a measure on a semiring £/ , v and suppose the 
sets A, A u .. . , A n , where A u . . . , A n are disjoint subsets of A, all belong 
to Then 

f* n 

2 (x(A ft ) < |x(v4). 

k= 1 

Proof. By Lemma 1, p. 33. there is a finite expansion 

s 

A = U A k (s > n) 

lc= 1 

with A lt . . . , A n as its first n terms, where 

A k e A k n A t — 0 (k # /) 

for all k, l = 1,2, . . . . Hence 

2 y-( A k) < 2 d( A k) = 

k= 1 k=l 

since [x is nonnegative and additive. 1 

Theorem 2. Let \x be a measure on a semiring and suppose the 
sets A, A lt .. . , A„ all belong to and satisfy the condition 

n 

A c U A k . 

k =l 

Then 

[x(T) < 2 

ft-1 
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Proof. According to Lemma 2, p. 33, there is a finite system of 
pairwise disjoint sets B x , . . . , B t belonging to •S'jj such that each of the 
sets A, A lt . . . , A n has a finite expansion 

A = U B s , A k = U B s (k = 1,. .., n) 

seM o seMn 

with respect to certain of the sets B s , where each index s e M 0 belongs to 
at least one of the sets M k (recall footnote 16, p. 33). Hence each term 
in the sum 

seM o 

appears at least once in the double sum 

n 

1 1 d(B s ) 

It follows that 

n n 

= 2 v-(b.) <22 K ^ s ) = 2 ^*)- 1 

sGikZo k—l seMjc k —1 

Corollary. If A <=■ A!, then \i(A) < |x(^l')- 
Proof. Choose n — 1. 1 

It will be recalled that the first step in constructing Lebesgue measure of 
plane sets was to extend measure from rectangles to elementary sets, i.e., to 
finite unions of disjoint rectangles. We now consider the abstract analogue 
of this process: 

Definition 2. A measure [x is called an extension of a measure m if 
£P m <=■ and ix(A) = m{A) for every A e LP m . 

Theorem 3. Any measure m defined on a semiring SP m has a unique 
extension ;x defined on the ring i.e., the minimal ring generated 

by £P m . 

Proof. By Theorem 3, p. 34, every set A e has a finite 

expansion n 

A = U B k , (1) 

1 

where the sets B u ... , B n are pairwise disjoint and belong to £P m . Let 

[i.(A) =r-jfm{B k ). (2) 

&= 1 

Then jx is obviously real, nonnegative and additive. Moreover, the 
quantity p.(/l) defined by (2) is independent of the expansion (1). In fact. 
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suppose A has another expansion of the form 

A = U C„ (1') 

1=1 

where the sets C x , . . . ,C l are pairwise disjoint and belong to £R m . Then, 
since the intersections B k n C, all belong to <9^, it follows from the 
additivity of the measure m that 

2 m(B n ) = i i n C.) = 2 m(C,), 

fc=l fe-lt-1 1=1 

and hence 

2 m ( c i ) = 

7,=1 

as asserted. This proves the existence of the extension p. To prove the 
uniqueness of p, suppose m has another extension p', and let A be the 
set (1)- Then, by the additivity of p', 

V-\A) = !>'( B k ) = 2 m(B k ) = p(T). 

k=l k=l 

Hence, since every set A e Si(£R rr ) has a representation of the form (1), 
the extensions p and p' coincide. | 

Remark. As already noted, the proof of Theorem 3 is a repetition in 
abstract language of the extension of measure from the semiring of rectangles 
to the minimal ring generated by this semiring, i.e., the class of elementary 
sets. 

26.2. Countably additive measures. Many problems in analysis involve 
unions of countably many sets, as well as unions of only finitely many sets. 
Correspondingly, the (finite) additivity imposed on measures in Definition 1 
turns out to be inadequate, and it is natural to introduce a stronger kind 
of additivity: 

Definition 2. A measure p with domain of definition SR is said to be 
countably additive or a-additive if 

00 

f-00 = 2 1 X ( A n) 
n=l 

for all sets A, A lt . .. , A n ,... e SR satisfying the conditions 

00 

A = U A„, A t n Af = 0 j). 

n~ 1 

Example. According to Theorem 10, p. 265, Lebesgue measure in the 
plane is a-additive. 
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Theorem 4. Suppose a a-additive measure m on a semiring -9 9 m is 
extended to a measure p on the ring 0 t (£ Rf ). Then p is also a-additive. 

Proof. Suppose 

A e 0 t { SQ , B n e 0 t { SR m ) in = 1, 2,...) 

and 

CO 

A = U B n , 

n =1 

where 

B k (A B v = 0 (k /). 


Then, by Theorem 3, p. 34, there exist finite expansions 


where 

Let 


A — (J Aj, B n — (J B ni , 

i i 

A k n A t = 0, B nk n B nl =0 (k # /). 
C n it — B n i C Aj. 


Then the sets C nii are pairwise disjoint and 

Aj — U U C ni} , 

n i 


Therefore 


^ Cnii* 

3 

m (Aj) = 22 

n i 


(3) 

(4) 


since m is cr-additive on SR m , and moreover 

lAA) = 2 m(Aj), (5) 

3 

p(BJ = 2 »(**«), ( 6 ) 

i 

by the definition of the measure p. Comparing (3)—(6), we find that 


p(A) = 2 m(Aj) = 222 rn(C nii ) = 22 ™(B ni ) = 2 V &«) 

j j n i n i n 

(the sums over i and j are finite, while those over n are convergent). § 
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Next we generalize Theorems 1 and 2 to the case of a-additive measures: 

Theorem I'. Let p be a a-additive measure on a semiring and 
suppose the sets A, A lt .. . , A k , . . . , where A lt . .. , A k , . . . are pairwise 
disjoint subsets of A, all belong to Then 

00 

IK A) < p(A- (7) 

Proof. By Theorem 1, 

IKA) < KA 

k= 1 

for all n = 1, 2,.. . . Taking the limit as n —oo, we get (7). j 

Theorem 2'. Let y. be a a-additive measure on a semiring SP, and 
suppose the sets A, A lt . .. , A k , . . . all belong to SPj and satisfy the 
condition 

icU A k . 

Then * =1 

co 

KA<IKA)- (8) 

1 

Proof By Theorem 4, we can assume that ft is defined on the ring 
instead of just on the semiring In fact, if p is a-additive, 
so is its extension on which we continue to denote by p, and the 

validity of (8) on obviously implies its validity on S^. The sets 

B n = (A C A n ) -Ul, 

k~l 

belong to and clearly satisfy the conditions 

00 

A = UB n , B n c A n , B k n B, = !S (k ^ /). 

1 

Therefore 

OO 00 

m (A ) = I m(B n ) < I m(A n ). I 

n =1 

Problem 1. Let X = {x lt x 2 ,. . .} be any countable set, and let p x , p 2 ,. . . 
be positive numbers such that 

00 

2Pn = 1- 

n= 1 

On the set 6^ of all subsets of X, define a rneasure [x by the formula 

tMX) = 2 Pn X ), 

x n eA 

where the sum is over all n such that x n e A. Prove that is a a-additive 
measure, with y(X ) = 1. 
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Comment. This kind of measure arises quite naturally in many problems 
of probability theory. 

Problem 2. Let X be the set of all rational points in the closed unit 
interval [0, 1], and let be the set of all intersections of the set X with 

arbitrary closed, open and half-open subintervals of [0, 1], including the 
degenerate closed intervals consisting of a single point. Prove that ^ is a 
semiring. Define a measure fi on by the formula 

P(A&) — b — a, 

where A ab is the intersection of X with any of the intervals [a, b], (a, b), 
(a, b], [a, b). Prove that \i is additive, but not a-additive. 

Hint. Although p.(X) = 1, X is a countable union of single-element sets, 
each of measure zero. 

Problem 3. Let jx be a measure which is additive, but not a-additive. 
Prove that 

a) Theorem 1' continues to hold for p,; 

b) Theorem 2' fails to hold for p. 

Hint. Use Problem 2. 

Problem 4. Given a measure p on a semiring suppose 

p(A) < |p(A) 

1 

whenever the sets A, A x> . . . , A k , ... all belong to Lf, and satisfy the 
condition 

A <= U A k . 

7c-1 

Prove that p is a-additive. 

Comment. It is often easier to verify that p has this property than to 
prove the a-additivity of p directly. 

27. Extensions of Measures 

Any measure m defined on a semiring can be extended to a measure 
defined on the ring he., the minimal ring generated by £P m . How¬ 

ever, if m is a-additive, we can extend m to a measure defined on a much 
larger class of sets than This is done by the abstract analogue of 

the procedure used in Sec. 25.2 to construct Lebesgue measure in the plane. 
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Assuming that £P m has a unit, 2 we begin with the analogues of Definitions 
2-5, pp. 259-260. 

Definition 1. Let m be a a-additive measure on a semiring £P m with 
a unit E. Then by the outer measure of a set A E is meant the number 

y.*(A) = inf 2 

B k ic 
k 

where the greatest lower bound is taken over all coverings of A by a finite 
or countable system of sets B k e SP m . 

Definition 2. By the inner measure of a set A c E is meant the 
number 

[x*Ol) == m(E) — [l*{E — A). 

Remark. By the exact analogue of Theorem 3, p. 258, it follows that 

p.*04) < p,* (4). 

Defintion 3. A set A is said to be ( Lebesgue ) measurable if 

V-*(A) = n*(A), 

i.e., if its inner and outer measures coincide. 

Definition 4 . If a set A is measurable, the number [i(A ) equal to the 
common value of yfA) and a*(A) is called the Lebesgue measure of A. 3 

Remark. Clearly, a set A <=■ E is measurable if and only if 

\f(A) + f jl*(E — A) = m(E). (1) 

In particular, it follows from (1) that if A is measurable, so is E — A. 

Theorem 1. If A is any set and {A n } is any finite or countable system 
of sets such that 

A <= U A„, 

n 

then 

{ j.*(A) < 2 ^(AJ. 

n 

Proof. Exactly analogous to that of Theorem 4, p. 259. g 


2 The case where !? m fails to have a unit will be discussed later (after Theorem 7). 

3 It turns out, of course, that \i is a measure as defined in Sec. 26.1 (see Theorem 5, 
where the additivity of p. is proved). In particular, this justifies the use of the notation 
•Sjj for the system of all measurable sets. 


Theorem 2. Every set A e M(SP,f is measurable, with Lebesgue 
measure equal to m{A), where m is the extension of m from the semiring 
to the ring SA{-‘P m )- 

Proof. Exactly analogous to that of Theorem 5, p. 259. g 

Theorem 3. A set A is measurable if and only if, given any e > 0, 
there is a set B e 3%{£Pf) such that 

[l*(A A B) < s. 

Proof. Exactly analogous to that of Theorem 6, p. 261. 1 

Theorem 4. The system SP^ of all measurable sets is a ring. 

Proof. Exactly analogous to that of Theorem 7, p. 262 and its 
corollary. J 

Remark. Obviously E is the unit of £P^, so that is an algebra of 
sets (see p. 31). 

Theorem 5. The set function p(/l) is additive on 

Proof. Exactly analogous to that of Theorem 8, p. 263. | 

Theorem 6. The set function {i(A) is a-additive on 

Proof. Exactly analogous to that of Theorem 10, p. 265. I 

Remark. Thus p, is a cr-additive measure of the system of all measur¬ 
able sets. This measure is called the Lebesgue extension of the original 
measure m. 

Theorem 7. The system -f, of all measurable sets is a Borel algebra 
with unit E. 

Proof. Recall from p. 35 that a Borel algebra is closed under the 
operations of taking countable unions and intersections. The proof is 
the exact analogue of that of Theorem 9, p. 264. | 

It is interesting to note that an arbitrary measurable set can be approxi¬ 
mated to within a set of measure zero by a set of a very special kind: 

Theorem 8. Given any set A e there are sets 

B nlc e m,IP m ) (B n i e B n2 =•••<=*„»<=•••) 

and corresponding sets 

B n = U B nk eSP, (B, => B, = ••■=> =>•■ •) 
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such that 

A <= B = n B n , 

n 

U-(A) = \i(B). 

Proof. Given any «, we can cover A by & union 

= U A nr 

r 

of sets A nr e £P m such that 

M-(C n ) < p(A) + ~ ■ 

n 

Let 

Bn = n C k , 

k= 1 

so that, in particular, B x => B 2 =’•■■=> B n =>■■■. Then it is easy to 
see that 

B n = U S ns , 

where S„ s e SP m . Next let 

Bnh = U & ns , 

s= 1 

so that, in particular, 

Bn = U B nk . 

k 

Then obviously B nk e and B nl <= B ni <= • • • <= B nk c • • •. 

Moreover 

^ c 5 = nu„ 

n 

since B is an intersection of sets containing A. It follows that 

[l{A) < ( 2 ) 

On the other hand, B <= B n <= C n for every n, and therefore 


fx(B) < f i(B n ) < p(C n ) < f x{A) + 
Taking the limit as n —► co, we get 

{J-(B) < il(A), 

which, together with (2), implies ii(A) = y.(B). j 


Our construction of the Lebesgue extension of a measure m defined on a 
semiring must be modified somewhat if SP m fails to have a unit. We 
continue to use Definition 1 to define the outer measure p,*, but p* is now 
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defined only on the system of all sets with coverings 

U B, (B t e SP m ) 

k 

such that 

2 m(B k ) < co- 

k 

Since Definition 2 is meaningless in the absence of a unit, we now define 
measurable sets by using the property figuring in Theorem 3: 

Definition 3'. A set A is said to be ( Lebesgue ) measurable if, given 
any e > 0, there is a set B e such that p*G4 ii)<£. 

Definition 4'. If a set A is measurable, the number p(.d) equal to 
its outer measure p*(/l) is called the ( Lebesgue ) measure of A. 

Remark. Note that Definitions 3' and 4' are equivalent to Definitions 3 
and 4 if has a unit. 

In the case where SP m has no unit, Theorems 4-6 continue to hold, since 
the proofs of Theorems 5 and 6 do not require to have a unit, while the 
proof of Theorem 4 can easily be freed of this requirement (see Problem 4). 
However, Theorem 7 now takes a new form (see Problem 5). As before, the 
a-additive measure p on the system of all measurable sets is called the 
Lebesgue extension of the original measure m. 

Remark. There is an interesting analogy between the construction of the 
Lebesgue extension of a measure m defined on a semiring SP m and the process 
of completing a metric space. Let m be the extension of m from the semiring 
SP m to the ring and suppose we regard m(A A B ) as the distance 

between the elements A, Be Then becomes a metric space 

(in general, incomplete), whose completion, according to Theorem 3, is just 
the system of all Lebesgue-measurable sets. However, note that from a 
metric point of view, two sets A, Be <5^ are indistinguishable if p (A A B) = 0. 

Problem 1. Let m be a a-additive measure on a semiring £P m with a unit 
E, let p be the Lebesgue extension of m, and let p be an arbitrary a-additive 
extension of m. Prove that p(A) = p (A) for every measurable set A on 
which p is defined. 

Hint. First show that p*(A) < p (A) < p*(^4). 

Problem 2. Let m be the same as in the preceding problem, and let m be 
the extension of m to a measure defined on Prove that the outer 

measure of a set A e E is given by 

p*(T)= M !'«(»*-)> 

ACZ\JB k te¬ 
le 
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where the greatest lower bound is taken over all coverings of A by a finite 
or countable system of sets B k e 3i(£P m ). 

Problem 3. State and prove the analogues of Theorem 11, p. 266 and its 
corollary for an arbitrary oadditive measure p defined on a Borel algebra 
with unit E. 

Problem 4. Give a proof of Theorem 7 valid in the case where SP fails 
to have a unit. 

Hint. Suppose A x , A 2 6 SP^. Then A x U A 2 e .IP, by the same proof as 
before (cf. p. 262). Moreover, there are sets B ly B., e 32(7P m ) such that 

P*Ui EB 1 )<- p*(/t 2 A B a ) < - . 

„ 2 2 
But 

(Ai - A 2 ) A (B x - B, 2 ) c (A x A B x ) U (A 2 A B 2 ), 

and hence <i*(A A B) < e where B = B x - B 2 e 3t{£P^. Therefore A, - A 2 
e ,‘Pj. To prove that A x O A 2 and A x A A 2 belong to ■ ( P l , use the formulas 

A x D A 2 = A x — (A x — A 2 ), 

A x aA 2 = (A x - A 2 ) U (A 2 - A x ). 

Problem 5. Given a measure m on a semiring 3P m with no unit, let p 
be the Lebesgue extension of m and £P V the corresponding system of all 
measurable sets. Prove that 

a) SP^ is a 8-ring (see p. 35); 

b) The set 

A=l)A k (A k e Sft 

k 

belongs to SP^ if and only if there is a constant C > 0 such that 



Comment. The necessity of the condition (3) is obvious, since our 
measures are always finite. 

Problem 6. Let p and SP^ be the same as in the preceding problem. 
Prove that the system of all sets Be which are subsets of a fixed set 
A e is a Borel algebra with unit A. 

Problem 7. A measure p is said to be complete if every subset of a set 
of measure zero is measurable, i.e., if A’ c: A, ii(A) = 0 implies A' e SP. 
(If A' e £P V , then obviously \i.(A') = 0.) Prove that the Lebesgue extension 
of any measure m is complete. 
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Hint. If A' <=■ A and p(zl) = 0, then p*04') = 0. But 0 e @{£P m ) and 
p*G4' A 0) = p*G4') = 0. 

Problem 8. Let in be a measure defined on a ring M. For example, m 
might be the extension of a measure m originally defined on a semiring EP m 
to a measure defined on the minimal ring M = ^(^) generated by £P m . 
Then a set A is said to be Jordan measurable if, given any s > 0, there are 
sets A', A" e 3& such that 

A' <= A <= A", m(A” — A’) < s. 

Prove that the system 3k* of all Jordan-measurable sets is a ring containing 

3k. 

Problem 9. Let m, 3k and 3k* be the same as in the preceding problem, 
and let ,</ be the system of all sets A such that there is a set B e Sk containing 
A. Given any set A e sP, let 

p(A) = inf m(E), 

Bz>A 

Best 

p(T) = sup m(B ) 

BCi 

Be& 

(since 0 <= A, A always contains a set in 3k). Prove that 

a) pG4) < pG4); 

b) The ring 3k* coincides with the system of all sets AesP for which 
p (A) = pC4); 

c) If 

A c U A k , 

le=l 

where A, A x , ... , A n all belong to sP , then 

n __ 

y.(A) < 2 

2c=1 

d) If A x ,... , A n are pairwise disjoint sets contained in a set A, then 

n 

p(X> > 2 pM*)- 

— fc=l “ 

By the Jordan measure of a set A e 3k*, we mean the number p(^t) equal to 
the common value of p(^) and p (A). Prove that p is a measure on 3k* = S^. 

Comment. The measure p is called the Jordan extension of the measure 
in. If in is itself an extension of a measure m originally defined on a semiring 
3P m , we write 3k* = 32*(<SP m ) and call p the Jordan extension of the measure 
m, as well as of the “intermediate” measure in. 





282 MEASURE 


CHAP. 7 


SEC. 27 


EXTENSIONS OF MEASURES 283 


Problem 10. Given two measures m x and m 2 defined on rings 3t x and A 2 , 
let ij. x and p, 2 be their Jordan extensions onto the larger rings A* = SP and 
PA* = Prove that y x and p, 2 coincide if and only if 

•Ai c tf. a , rh x (A) = y 2 (A) for all A e A x , 

A 2 c <5^, m 2 {A) = y x ( A) for all A e A 2 . 

Problem 11. Let m be the measure defined in Sec. 25.1 on the ring PA of 
all elementary sets (i.e., all finite unions of disjoint rectangles with sides 
parallel to the coordinate axes), and let y be the Jordan extension of m. 
Prove that y does not depend on the particular choice of the underlying 
rectangular coordinate system. In other words, prove that y (as well as 
the corresponding ring A* = SQ does not change if all the sets in A are 
subjected to the same shift and rigid rotation. 

Problem 12. We say that a set A is a set of uniqueness for a measure m if 

1) There is an extension of m defined on A; 

2) If Uj and y 2 are two such extensions, then y x (A) = y 2 (A) 

Prove that the system of sets of uniqueness of a measure m defined on a 
semiring SP m coincides with the ring A* = A*(PP m ) of sets which are Jordan 
measurable (with respect to m). In other words, prove that the Jordan ex¬ 
tension of a measure m originally defined on a semiring SP m is the unique 
extension of m to a measure defined on A* = A*{SPf), but that the 
extension of m to a larger system is no longer unique. 

Problem 13. Prove that if a set A is Jordan measurable, then 

a) A is Lebesgue measurable; 

b) The Jordan and Lebesgue measures of A coincide. 

Prove that every Jordan extension of a cr-additive measure is cr-additive. 

Problem 14. Give an example of a set which is Lebesgue measurable, but 
not Jordan measurable. 

Problem IS. We say that a set A is a set of o-uniqueness for a cr-additive 
measure m if 

1) There is a cr-additive extension of m defined on A; 

2) If y x and y 2 are two such extensions, then y x (A) = y 2 (A). 

Prove that the system of sets of cr-uniqueness of a cr-additive measure m 
defined on a semiring SP m coincides with the system of sets which are 
Lebesgue measurable (with respect to m). 

Hint. To show that every Lebesgue-measurable set A is a set of cr- 
uniqueness for m, choose any e > 0. Then there is a set B e A = A(PP m ) 


such that y*(A A B) < s. If p. is any extension of m defined on A (and on 
A), then y(B) = m{B), where m is the unique extension of m onto A. 
Moreover, y(A A B) < y*(A A B) < e, and hence |p.(^) — m(B)\ < s. 
Therefore IpaC/l) — yz(A)\ < 2e if and y 2 are two cr-additive extensions 
of m defined on A (and on PA). Hence y x (A) = y 2 (A), by the arbitrariness 
of e. 

Problem 16. Let m be a cr-additive measure defined on a semiring -Af 
and let be the domain of the Lebesgue extension of in. Let m' be a cr- 
additive extension of m to a semiring £A m , such that 

cz sr n . <= se, 

and let 3?' be the domain of the Lebesgue extension of m'. Prove that 
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28. Measurable Functions 

28.1. Basic properties of measurable functions. Given any two sets Zand 
Y, let 9 be a system of subsets of X and 9' a system of subsets of Y. Then 
an abstract function y — fix) defined on X and taking values in Y is said 
to be (9, 9'(-measurable if A e 9' implies f~ x (A) e 9. 

Example. Let X and Y both be the real line R\ so that y = f(x) is a 
“function of a real variable.” Moreover, let 9 and 9' both be the system 
of all open (or closed) subsets of R 1 . Then our definition of measurability 
reduces to that of continuity (recall Sec. 9.6). On the other hand, if we 
choose both 9 and 9' to be the system SS 1 of all Borel sets on the real line 
(recall p. 36), our definition becomes that of a Borel-measurable (or simply 
j B-measurable) function. 

In what follows, we will be primarily concerned with the notion of real 
functions measurable with respect to some underlying measure fx, this being 
the case of greatest interest from the standpoint of integration theory. More 
exactly, let Zbe any set and Y the real line R 1 , with 9 = 9 v the domain of 
definition of some cr-additive measure [x and 9' the system 3S X of all Borel 
sets B <= R\ For simplicity, we assume that 9 has a unit equal to X itself. 
Moreover, since any ^-additive measure can be extended onto a Borel algebra 
(by Theorem 7, p. 277), we might as well assume from the outset that 9^ 
is a Borel algebra. These considerations suggest 

Definition 1. Given a a-additive measure fx defined on a Borel algebra 
9^ of subsets of a set X, where X is the unit of 9^, let y = /(x) be a real 
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function defined on X, and let J 11 be the set of all Borel sets on the real 
line. Then the function f is said to be ^.-measurable (on X) iff~ x (A) e 9^ 
for every A e SS 1 , or equivalently if f ~ x (3§ x ) <= 9^. 

Theorem 1. A function f is \i-measurable if and only if the set 
[x:f(x) < c } is }x- measurable (i.e., belongs to 9~) for every real c. 

Proof. If/is [x-measurable, then obviously so is {x:f(x) < c}, since 
(_ oo, c) is a Borel set. Conversely, let S be the system of all semi-infinite 
intervals (—oo, c), and suppose/ _1 (S) <= 9 l . Since 9(X), the Borel 
closure of 2 (see p. 36), coincides with the system SS 1 of all Borel sets 
on the line (why?), we have 

r x m = f-\@ (£)) = 9(f-nm <= 9(9f 

(recall Problem 3e, p. 36). But Sd(9f) = 9^, since 9^ is a Borel 
algebra, and hence 

f- x (@ x ) cz 9 r 1 

Theorem 2. Let {/„} be a sequence of \i-measurable functions on X, 
and let f be a function on X such that 

f(x) = lim fjx) 

n~* oo 

for every x e X. Then f is itself (x- measurable. 

Proof. First we verify that 

{x:f(x) <c} = UUfl [x'.fjx) < c - i|. (1) 

1c n m > n { K) 

In fact, if/ (x) < c, there is an integer k > 0 such that 

f(x) <c-y, 
k 

and then for this k, there is an integer n > 0 so large that 

fm(x) < C - i (2) 

k 

for all m> n. Therefore every x belonging to the left-hand side of (1) 
also belongs to the right-hand side. Conversely, if x belongs to the 
right-hand side of (1), there is a k such that (2) holds for all sufficiently 
large m. But then f(x) < c, i.e., x belongs to the left-hand side of (1). 
Now, since the functions f m are fx-measurable, the sets 

| x-.fjx) < c - Ij 
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all belong to <9^, and hence so does the right-hand side of (1), since £P V 
is a Borel algebra. Therefore {x:f(x) < c} e =S£. But then / is ;x- 
measurable, by Theorem 1. 1 

Theorem 3. A B-measurable function of a p.- measurable function is 
itself [x- measurable. 

Proof Let f(x) = <p[4(x)L where <p is B-measurable and b is jx- 
measurable. If A c: R 1 is any immeasurable set, then its preimage A' — 
<p _1 (A) is ^-measurable, and hence the preimage A" = 4 rl (A') is jx- 
measurable. But A" = f -1 (A), and hence/is (x-measurable. j 

Corollary. A continuous function of a (x -measurable function is 
itself (x- measurable. 

Proof A continuous function is clearly B-measurable. g 

28.2. Simple functions. Algebraic operations on measurable functions. 
A function / is said to be simple if it is (x-measurable and takes no more 
than countably many distinct values. This notion clearly depends on the 
choice of the measure |x. 

The structure of simple functions is clarified by 

Theorem 4. A function f taking no more than countably many distinct 
values y lt y 2 ,... is [x- measurable if and only if the sets 

K = {*:/(*) = yJ (n = 1,2,...) 

are [i-measurable. 

Proof. Since each single-element set {y n } is a Borel set, the set A n , 
being the preimage of {yfj, is measurable if/is measurable. 1 Conversely, 
suppose the sets A n are all measurable. Then the preimage/ _1 (B) of any 
Borel set B ^ R 1 is measurable, being a union 

u A n 

y n eB 

of no more than countably many measurable sets A n . But then / is 
measurable, g 

The relation between measurable functions and simple functions is shown by 

Theorem 5. A function f is \x-measurable if and only if it can be 
represented as the limit of a uniformly convergent sequence of simple 
functions. 


1 For simplicity, we often say “measurable” instead of “p-measurable,” omitting 
explicit reference to the underlying measure [x. 


Proof. If/ is the (uniform) limit of a convergent sequence of simple 
functions, then/is ix-measurable by Theorem 2, since simple functions 
are (x-measurable by definition. Conversely, given any fx-measurable 
function /, let 

... m , e m ,, , m + 1 
n n n 

where m and n are positive integers. Then the functions f„ are simple 
and moreover converge uniformly to/as n -»■ oo, since 

l/(*)-/.Ml < - • 1 

n 

The next few theorems show that the class of measurable functions is 
closed under the usual algebraic operations. 

Theorem 6 .Iff and g are measurable, then so is f + g. 

Proof. First let/ and g be simple functions, taking value _yj,_y 2 , . . . 
and z 1; z 2 ,. . . , respectively. Then the sum h =/ + g can only take the 
values c iS — y t + z jt where each such value is taken on a set of the form 

{x:h(x) = c i} } = U ({*:/(*) = n (x:g(x) = z,}). (3) 

Vi + Zj-Cil 

There are no more than countably many values w of the function h = 
f+g, and moreover each set {x:h(x) — cf s is measurable, since the 
right-hand side of (3) is clearly measurable. Therefore h = f + g is a 
simple function. 

Now let/ and g be arbitrary measurable functions, and let {/„} and 
{g n } be sequences of simple functions converging uniformly to / and g, 
respectively, as in the proof of Theorem 5. Then the sequence of simple 
functions {/„ + g n } converges uniformly to f+g, and hence / + g is 
measurable, by Theorem 5. I 

Theorem 7. Iff is measurable, then so is cf where c is an arbitrary 
constant. 

Proof. Obviously, the product of a simple function and a constant is 
again simple. But if {/„} is a sequence of simple functions converging 
uniformly to /, then {c/„} converges uniformly to cf, and hence cf is 
measurable, by Theorem 5. 1 

Theorem 8. Iff and g are measurable, then so is f—g. 

Proof. An immediate consequence of Theorems 6 and 7. 1 

Theorem 9. Iff and g are measurable, then so is fg. 
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Proof. Clearly, 

fg = \[{f+gf-if-gn 

But the expression on the right is a measurable function, by Theorems 
6-8 and the fact that the square of a measurable function is measurable 
(this follows from the corollary to Theorem 3). | 

Theorem 10. If f is measurable, then so is l If, provided f does not 
vanish. 

Proof. We have 

( X '/W < C } = j X: -^ > c} U < 

if c > 0, 

(* : /w <e ) = K </w<0 ) 

if c < 0, and 

if c = 0. But in each case the set on the right is measurable. 1 

Corollary. If f and g are measurable, then so is fig, provided g does 
not vanish. 

Proof. An immediate consequence of Theorems 9 and 10. | 

28.3. Equivalent functions. The values of a function can often be ne¬ 
glected on a set of measure zero. This suggests 

Definition 2. Two functions f and g defined on the same set are said 
to be equivalent (with respect to a measure p) if 

\>.{x:f{x) # g(x)} = 0. 


positive measure, and hence 

\!.{x:f{x) g( x )} > 0, 

i.e.,/and g cannot be equivalent, contrary to hypothesis. ( 

Remark. Thus two continuous functions cannot be equivalent if they 
differ at even a single point. However, discontinuous functions can obviously 
be equivalent without being identical. For example, the Dirichlet function 

1 if x is rational, 

0 if x is irrational 

is equivalent to the function g(x) = Q (recall Problem 3, p. 268). 

Theorem 12. A function f equivalent to a measurable function g is 
itself measurable. 

Proof. It follows from Definition 2 that the sets {x:f (x) < c} and 
{x:g(x) < c} can differ only by a set of measure zero. Hence if the second 
set is measurable, so is the first set. The proof is now an immediate 
consequence of Theorem 1. 1 

28.4. Convergence almost everywhere. Since the behavior of measurable 
functions on sets of measure zero is often unimportant, it is natural to 
introduce the following generalization of the ordinary notion of convergence 
of a sequence of functions: 

Definition 3. A sequence of functions {f n (x)} defined on a space X 
is said to converge almost everywhere to a function f(x) if 

lim f n (x) = f(x) (4) 

n~* co 

for almost all x £ X, i.e., if the set of points for which (4) fails to hold is 
of measure zero. 


/M-( 


A property is said to hold almost everywhere (on E) if it holds at all points 
(of E) except possibly on a set of measure zero. Thus two functions/ and g 
are said to be equivalent (written/~ g) if they coincide almost everywhere. 

Theorem 11. Given two functions f and g continuous on an interval E, 
Suppose f and g are equivalent (with respect to Lebesgue measure p on the 
line). Then f and g coincide. 

Proof. Suppose /(x 0 ) ^ g(x 0 ) at some point x 0 e E, so that /(x 0 ) — 
g(x 0 ) 0. Since /— g is continuous, there is a neighborhood of x„ 

(possibly one-sided) in which/— g is nonzero. This neighborhood has 


Example. The sequence {/„(*)} = {(-*)"} defined on [0, 1] converges 
almost everywhere to the function f(x) = 0, in fact everywhere except at the 
point x = 1. 

Theorem 2 now has the following generalization: 

Theorem 2'. Let {/„} be a sequence of [x-measurable functions on X, 
and let f be a function on X such that 

f (x) = lim f n (x) (5) 

n -*- 00 




290 INTEGRATION 


MEASURABLE FUNCTIONS 291 


CHAP. 8 

almost everywhere on X. Then f is itself p- measurable, provided p. is 
complete? 

Proof If A is the set on which (5) holds, then [l(X — A) = 0. The 
function/is measurable on A, by Theorem 2, and also on X — A, since 
every function is measurable on a set of measure zero if ;x is complete 
(why?). Hence/is measurable on the whole set X = A U (X — A). | 

28.5. Egorov’s theorem. The following important theorem shows the 
relation between the concepts of convergence almost everywhere and uniform 
convergence: 

Theorem 12 {Egorov). Let {/„} he a sequence of measurable functions 
converging almost everywhere on a measurable set E to a function f. Then, 
given any 8 > 0, there exists a measurable set E s <=■ E such that 

1) p(£ s ) > fx(£) - 8; 

2) {/„} converges uniformly to f on E s . 

Proof. The function/ is measurable, by Theorem 2'. Let 

£*=n(x: |/(x)-/(x)| <-M. (6) 

<>«1 m) 

Thus, for fixed m and n, £> is the set of all points x such that 

\f{x) -f{x) | < i 
m 

holds for all i > n. Moreover, let 

00 

E m — U £“. 

n~l 

It follows from (6) that 

£*c£ ! "c...c£*c..., 

and hence, by the corollary to Theorem 11, p. 267, 2 3 given any m and 
any 8 > 0, there is an nfm) such that 

fx(£ m - (m) ) < ~ . (7) 

Let 

co 

e s = n £“„(„). 


2 See Problem 7, p. 280. 

3 See also Problem 3, p. 280. 
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Then £ s satisfies the two conditions of the theorem. The fact that the 
sequence {/„} is uniformly convergent on £ s is almost obvious, since if 
x e £ s , then, given any m = 1,2,..., 

If(x) ~f(x)\ < ~ 
m 

for every i > n a {m). 

To verify condition 2), we now estimate the measure of the set £ — E t , 
noting first that [x(£ - £ m ) = 0 for every m. In fact, if x 0 e £ - E m , 
then there are arbitrarily large values of i such that 

l/ 4 (* 0 ) -/(*«)l > 1 - 

m 


which means that the sequence {/„} cannot converge to/at the point x 0 . 
Therefore ;x(£ - £ ro ) = 0, as asserted, since {/„} converges to/almost 
everywhere, by hypothesis. It follows from (7) that 


fx(£ 


£no(m)) = - £ »0<m)) < ^ • 


Therefore 


p(£ - £ 0 ) = (xffi - fl £*, m) ) = (x( U (£ — £“,»))) 

\ m= 1 / \ra=l / 


< 2 1 J-(E — E 

m =1 


m=l Z 


and hence p(£ 8 ) > — S. 1 

Problem 1 . Prove that the Dirichlet function 

1 if x is rational, 

0 if x is irrational 

is measurable on every interval [a, b]. 

Problem 2 . Do the same for the function 

if x = - is rational, 

q 

if x is irrational. 

Problem 3. Suppose /(x) is measurable on [a, b]. Is g(x) = e f{x) measur¬ 
able on [a, b]2 

Problem 4. Prove that if /is measurable, then so is \f\. 

Problem 5. Let {/„} be a sequence of measurable functions converging 
almost everywhere to a function /. Prove that {/„} converges almost every¬ 
where to a function g if and only if/and g are equivalent. 
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Problem 6. A sequence {/„} of p-measurable functions is said to converge 
in measure to a function / if 


lim Ax'.\f n (x) -f{x )| > 8} = 0 


for every 8 > 0. Prove that if a sequence {/„} of measurable functions 
converges to/almost everywhere, then it converges to/in measure. 


Hint. Let A be the set (of measure zero) on which {/„} fails to converge 
to f, and let 

£*( s ) = {x: I/*(*) -fix )| > 8}, 


X(8) = U £ t (8). 


( 8 ) 


M = D R n i 8). 

«=1 

Then the sets (8) are all measurable (why?), and p(R n (8)) p(M) as n -*■ <», 
since i? x (8) => A»(8) 3 • ■ •. Prove that M <= A and hence that AM) — 0 
(as always, we assume that p is complete). It follows that p(7/(8)) -► 0 as 
n-* oo. Now use the fact that £„(8) <= i?„(8). 

Problem 7. Let {/„} be a sequence of measurable functions converging in 
measure to a function /. Prove that {/„} converges in measure to a function 
g if and only if/ and g are equivalent. 

Problem 8. Given any positive integer k, consider the function 


ft\x) 


1 if -- < x < - , 

k k 

.0 otherwise, 


defined on the half-open interval (0, 1], Show that the sequence 

f{\) 2) f{2) /’(fc) file) /■(&) 

J l v l ’7 2 > • • • yj l 2 5***?y^ >••• 
converges in measure to zero, but does not converge at any point whatsoever. 

Comment. Thus the converse of the proposition in Problem 6 is false. 
Instead we have the weaker proposition considered in the next problem. 

Problem 9. Prove that if a sequence {/„} of functions converges to / in 
measure, then it contains a subsequence {f n } converging to f almost 
everywhere. 

Hint. Let {§„} be a sequence of positive numbers such that 


litn = 0, 


i 


and let {s„} be a sequence of positive numbers such that 

CO 

IX < °°- 

n =1 

Let {n k } be a sequence of positive integers such that n k > n k- 1 and 
Ax-\f n fx) -fix )| > X < e fc (L = 1,2,...). 

Moreover, let 

Pi = u {x: \f n fx)-fix)\ > 8*}, <2 = n Pj. 

k=i i=l 

Then ^(P,) p.(g) as i oo, since P, ^ P 2 = • • •. On the other hand, 

cO 

p(p<) <IX> 

)fc=l 

and hence pt-S)) -> 0, so that p(Q) = 0. Now show that {f n f converges to 
f on E — Q. 

Problem 10. Prove that a function / defined on a closed interval [a, b] is 
p-measurable if and only if, given any s > 0, there is a continuous function 
cp on [a, b ] such that p{x:/(x) f tp(x)} < s. 

Hint. Use Egorov’s theorem. 

Comment. This result, known as Luzin’s theorem, shows that a measurable 
function “can be made continuous by altering it on a set of arbitrarily small 
measure.” 


29. The Lebesgue Integral 

The concept of the Riemann integral, familiar from calculus, applies 
only to functions which are either continuous or else do not have “too many” 
points of discontinuity. Hence we cannot form the Riemann integral of a 
general measurable function/. In fact,/may be discontinuous everywhere, 
or it may even be meaningless to talk about the continuity of / in the case 
where/is defined on an abstract set. For such functions, there is another 
fully developed notion of the integral, due to Lebesgue, which is more 
flexible that the notion of the Riemann integral. 

Let / be a function defined on a closed interval [a, b ] of the x-axis. 
Then to form the Riemann integral of /, we divide [a, b ] into many sub¬ 
intervals, thereby grouping together neighboring points of the x-axis. On 
the other hand, as we will see below, the Lebesgue integral is formed by 
grouping together points of the x-axis at which the function / takes neigh¬ 
boring values. In other words, the key idea of the theory of Lebesgue 
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integration is to partition the range of the function / rather than its domain. 

This immediately makes it possible to extend the notion of integral to a very 
large class of functions. 

Another advantage of the Lebesgue integral is that it is constructed in 
exactly the same way for functions defined on an abstract “measure space” 

(an arbitrary set X equipped with a measure) as for functions defined on the 
real line. This is to be contrasted with the situation for the Riemann integral, 
which is first introduced for functions of a single real variable and then 
extended, with suitable modifications, to the case of functions of several 
real variables, but fails to make any sense at all for functions defined on an 
abstract measure space. 

In what follows, unless the contrary is explicitly stated, we will consider 
a cr-additive measure fx defined on a Borel algebra of subsets of a set X, 
with X as the unit. We will assume that all sets under consideration are 
fx-measurable, and that all functions under consideration are defined and 
(j.-measurable on X. 

29.1. Definition and basic properties of the Lebesgue integral. Let/be a 
simple function, i.e., a (x-measurable function taking no more than countably ! 

many distinct values i; 

0) i 

Then by the ( Lebesgue ) integral of/over the set A, denoted by 

I 

j A f(x) dp, | 

we mean the quantity 

2 ynK A n) (2) 

n 

where 

A n = {x:xeA,f(x) = yj, 

provided the series (2) is absolutely convergent. If the Lebesgue integral 
of f exists, we say that/is integrable or summable (with respect to the measure 
jx) on the set A. 

I 

Example. Obviously, 

l, 1 ' d v - = j A d P = bO 4 )- 

We now get rid of the restriction that the numbers (1) be distinct: 

Lemma. Given a simple function f defined on a set A, suppose A is a 
union 

A = U B k 

k 
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of pairwise disjoint sets B k such that f takes only one value c k on B k . Then 
f is integrable on A if and only if the series 

2 c iA B k) (3) 

7c 

is absolutely convergent , in which case 


f/(*) dp = 2 c k [i.{B k ). 
j a k 

Proof. Each set 

A n = {x:xeA,f(x) = yj 

is the union of the sets B k for which c k — y„. Therefore 4 

2 = 2 y« 2 'Msfi = 2 

n n c k =V n * 

Moreover, since u is nonnegative, we have 

21 yJ v( A n) = 2 Ini 2 y-( E k) = 2 \ c k\ 

n n c k =v n k 

so that the series (2) is absolutely convergent if and only if the series (3) 
is absolutely convergent. 1 

Theorem 1. Let f and g be simple functions integrable on a set A, and 
let k be any constant. Then f + g and kf are integrable over A, and 

+ g(x)] rffx = £/(*) dii + jj(x) rffx, (4) 

f [kf(x)]d\i = k\ /(x)d(x. (5) 

JA J A 

Proof. Suppose / takes distinct values y t on sets F ( <= A, while g 
takes distinct values z,- on sets G,- <= A, where i,j = 1, 2.Then 


f fix) d[i = 2 yAFi), (6) 

•'A i 

[ g(x) d(x = 2 z A G i)- ( 7 ) 

JA j 

Clearly, f + g takes the values c ia - = y ( + z 3 - (not necessarily distinct) 
on the pairwise disjoint sets B u = F t O G 3 -. It follows from 


^i) = 2^i nG,), 

3 


P-(Gj) = 2 [i(F t n Gj) 

i 


4 The notation Y ca ^ s f° r the sum over all k such that c k = y n . 

Ck=Vn 
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and the absolute convergence of the series (6) and (7) that the series 
2 2 =2 2 (y* + z M p i n <*j) 

i j i 3 

is absolutely convergent. Hence, by the lemma, / + g is integrable on 
A and 

J [fix) + g(x)] = 2 2 (tt + n G i) 

■M i j 

= 2 W(F.-) + 2 z #( G i)- ( 8 ) 

i i 

Comparing (6)-(8), we get (4). The proof of (5) is trivial. | 

Theorem 2. Letf be a bounded simplefunction on A, where |/(x)| < M 
if x e A. Then f is integrable on A and 

\ f(x) d\x < 

Proof. If/takes values y n on sets K ^ A (n = 1,2,...), then 
[/(x)d(x = 2 y n ^ n ) < 2 ly«l < m 2 = Mp(ri), 

J A. n n n 

where we have incidentally proved the integrability of/on A (how ?). 1 
Next we remove the restriction that/be a simple function: 

Definition. A measurable function f is said to be integrable {or 
summable) on a set A if there exists a sequence {/„} of integrable simple 
functions converging uniformly to f on A. The limit 

lim d v- ( 9 ) 

71 “* CO 

is then called the ( Lebesgue ) integral of f over the set A, denoted by 

This definition relies tacitly on the following conditions being met: 

1) The limit (9) exists (and is finite) for any uniformly convergent sequence 
of integrable simple functions on A ; 

2) For any given/, this limit is independent of the choice of the sequence 

{/J; 

3) For simple functions, the definitions of integrability and of the integral 
reduce to those given on p. 294. 


SEC. 29 


THE LEBESGUE INTEGRAL 297 


All these conditions are indeed satisfied. Condition 1) is an immediate 
consequence of the estimate 

jjjx) d[i - \ A fJx) dp = jjfjx) -/„(x)] dp 

= qG4) sup |/ m (x) -/„(x)|, 

xeA 

implied by Theorem 1 and 2. To prove 2), suppose the sequences {/„} and 
{/*} both converge uniformly to /, but 

lim \ f n {x)d]x^\im j f*{x)d\i. 

n-+ oo ^ 71-+00 A 

Let {cpj be the sequence 

■ ■ ■ >/«»/*> ■ ■ • 

Then {yj converges uniformly to /, but 

lim \ (f n {x)dy. 

n-*oo JA - 

fails to exist, contrary to condition 1). Finally, to prove 3), if/is simple, 
we need only consider the trivial sequence {/„} with general term f n =/. 

Theorem 1'. Theorem 1 continues to hold if f and g are arbitrary 
measurable functions integrable on A. 

Proof. An immediate consequence of Theorem 1, after taking suitable 
uniform limits of integrable simple functions. 1 

Theorem 3. If y is nonnegative and integrable on A and if |/(x)| < 
cp(x) almost everywhere on A, then f is also integrable on A and 

J7(x) d[i < dty. (10) 

Proof. If/and 9 are simple functions, then, by subtracting a set of 
measure zero from A, we get a set A' which can be represented as a 
finite or countable union 

A' = U A n 

n 

of subsets A n <= A 1 such that 

f(x) = a„, cp(x) = fi„ 

for all x 6 A n and 

Kl < b n {n — 1,2,...). 

Since 9 is integrable on A, we have 

2 Kl p(K) < 2 Kp(AJ = J’ ,?0) d H = j A 9 (x) d\L (11) 

n n 
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(see Problem 3b). Therefore /is also integrable on A, and 


I f ( x) dp 

J A 


f{x) d[L 

J A 


2 a n [i(A n ) 


( 12 ) 

n 


Comparing (11) and (12), we get (10). 

In the case where / and <p are arbitrary measurable functions, let 
{/„} and {cp„} be sequences of simple functions converging uniformly to/ 
and 9 , respectively, constructed in the“same way as in the proof of 
Theorem 5, p. 286. Then clearly 


\fn(x)\ < <?n(x) (ft — 1,2,...) 

on A'. Moreover each <p n is integrable, since 9 is integrable by hypoth¬ 
esis. It follows that each /„ and hence/ itself is integrable, where 

I ,l/„W|dp.< | 9 n (x)dfi. 

J A JA 


Taking the limit as n 00 , we again get (10). | 

Corollary. Iff is bounded and measurable on A, thenf is integrable 
on A. 


Proof. Choose y(x) — M, where 

M = sup \f(x)\. 

xeA 


Then 

f f(x) dp = 2 y /c {i(B t ) = 2 3V 2 V-( B nk) 

J A -fc k n 

= 22 y*KBn*) = 2 j, /(*)4<~ ( 1 5 ) 

n k n % ' A n 

Since/is integrable on A, the series converges absolutely, and 

k 

hence so do the other series in (15). (Here we use the nonnegativity of 
the measure p.) In particular, / is integrable on each set A n . 

Next let f be an arbitrary measurable function integrable on A. Then, 
given any e > 0, there is a simple function g integrable on A such that 

\f(x) - g(x) | < e (xe A). (16) 

For g we have 

J.g(x) d [>. = 2 /, g(*) d \ x , (17) 

as just shown, where g is integrable on each A n and the series converges 
absolutely. Hence, by (16),/is also integrable on each A n and 


)' f(x)d\i- [ g(x)dy. 
J A n J A n 


< 2 n) = ^( A )> 


j f(x) d[L - jj(x) dp 


< ep04)> 


29.2. Some key theorems. We now prove some important properties of 
the Lebesgue integral, regarded as a set function 

F ( A )= \ j ( x ) d\L (13) 

•M 

defined on a system of measurable sets (with the integrand / held fixed). 
Theorem 4. Let 

A^\jA n 

n 

be a finite or countable union ofpairwise disjoint sets A„, and suppose f is 
integrable on A. Then f is integrable on each A n and 

j fix) dp = 2 /, /(*) fy, (14) 

where the series on the right is absolutely convergent. 

Proof. First let/ be a simple function, taking the values y lt y 2 ,. .. , 
and let 

B k = {x:xeA,f(x) =y k }, B ntc = {x:x sA n ,f(x) = y k }. 


which, together with (17), implies the absolute convergence of the series 

2 / d v- 


and the estimate 


J A f(x) dp - 2 j A /W d V- 


< 2ep(^4). 


( 18 ) 


But (18) implies (14), since e > 0 is arbitrary. | 

Corollary. If f is integrable on A, then f is integrable on every 
measurable subset A’ <= A. 

Proof. Think of A as the union of the disjoint sets A' and A — A'. 1 

Remark. A succinct way of expressing the property (14) is to say that 
the set function (13) is o-additive. 


Theorem 5 ( Chebyshev’s inequality). If f is nonnegative and integrable 
on A, then 

1 

c Ja " 


p{x: x e A,f(x) > c} < - f(x) dp. 

r JA 
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Proof. If 


A' = {x:x e A,f(x) > c}, 


JT/00 d v- = / /(*) ^ 4- jf^/C*) 4 1 > J^,/W dp > <v(A') 


(see Problem 4a). | 

Corollary. If 


/ |/(x)| dp = 0 , 

*>A 


then f (x) = 0 almost everywhere. 

Proof. By Chebyshev’s inequality, 

pjx:x 6 A, |/(x)| > -| < nj^|/(x)| dp = 0 

for all n — 1, 2,. . . . Therefore 

p{x:x g A,f(x) /= 0} < 2 p(x:x e4, |/(x)| > -] = 0. 


Theorem 6. Iff is integrable on a set A, then, given any s > 0, there 
is a <$ > 0 such that 




for every measurable set E ^ A of measure less than S. 


Proof. The proof is immediate if /is bounded, since then 
f/M dp < Jj/WI dp < sup |/(x)| p(£) 

•IE ->E , xeE 

(see Problem 4c). In the general case, let 

A n = {x:x 6 A, n < \f{x)\ < n + 1}, 

B n = U 

w =0 

Qv = A — B n . 

Then, by Theorem 4, 
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and let 

0<S< - 5 —- . 

2 (IV + 1) 

Then \j.(E) < S implies 

l/W ^ | = Jj/WI ^ = L ^ + J*no, l/(x)l ^ 

< (JV + l)y(E) + [ |/(x)I dp < J f = e. | 

2 2 

Remark. The property figuring in Theorem 6 is expressed by saying that 
the set function (13) is absolutely continuous with respect to the measure p. 


Problem 1. Prove that the Dirichlet function 



if x is rational, 
if x is irrational 


fails to have a Riemann integral over any interval [a, b]. Prove that the 
Lebesgue integral of/over any measurable set A exists and equals zero. 


Problem 2. Find the Lebesgue integral of the function 


1 if x = - is rational, 
q q 

1 if x is irrational 

over the interval [a, b ]. 

Problem 3. Prove that 

a) If/is integrable on a set Z of measure zero, then 

j z f(x) dp = 0; 

b) If /is integrable on A, then 

j A , f(x) dp = j A f(x) dp 

for every subset A' c A such that p (A — A') = 0. 
Comment. We can regard a) as a limiting case of Theorem 6 . 
Problem 4. Prove that 

a) If/ is nonnegative and integrable on A, then 

I /(x) dp > 0 ; 
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b) If/ and g are integrable on A and/( x ) < g(x) almost everywhere, then 

( ,/M dp < j g(x) dp; 

J A J A 

V 

c) If/ is integrable on A and m < /M < M almost everywhere, then 

mp{A) < fix) dp < Mp{A). 

JA 

Problem 5. Prove that the existence of either of the integrals 
j f(x)dp, j \f(x)\du 

| ;; J A JA 

implies the existence of the other. 

Problem 6. Let 

A = U A n 

|j: ji n 

be a finite or countable union of pairwise disjoint sets A n , and suppose / 
is integrable on each A n and satisfies the condition 

2 L I/Ml d \h < 00 • ( 19 ) 

i|| |j n " 

| j Prove that/ is integrable on A. 

I Hint. If / is simple, with values yi,yt, ... , let the sets B k and B nk be 

the same as in the proof of Theorem 4. Then 

j A I/Ml = jjy k | p(B nk ). 

I | The absolute convergence of (19) implies the convergence of 

21 W V-(Bnk) = 2 IJ*l 2 p( 5 r,t) = 2 It*I 

II 1 » * k n k 

and hence the integrability of/ on A. In the general case, let g be a simple 
function approximating/, and show that (19) implies the convergence 

2 j* IsM 

so that g, and hence/, is integrable on A. 

Comment. This is essentially the converse of Theorem 4. 

Problem 7. Let p. be a a-additive measure defined on a Borel algebra 
of subsets of a given set X, and let / be nonnegative and integrable on X 
I; (with respect to p). Prove that the set function 

/ T(/l) = - J ( /M d\L 
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is itself a a-additive measure on with the property that F(A) = 0 
whenever p(^4) = 0 . 

Problem 8. Suppose /is integrable on sets A u A 2 ,. . . , A n ,. .. such that 


and let 

A x => A 2 ^ ^ A n 3 


GO 

a = n A n . 

Does 

n—1 

converge to 

j A fix) dp 


\j{x) dpi 


30, Further Properties of the Lebesgue Integral 

30.1. Passage to the limit in Lebesgue integrals. The problem of taking 
limits behind the integral sign, or equivalently of integrating a convergent 
series term by term, is often encountered in analysis. In the classical theory 
of integration, it is proved that a sufficient condition for taking such a limit 
is that the series (or sequence) in question be uniformly convergent. We 
now examine the corresponding theorems for Lebesgue integrals, which 
constitute a rather far-reaching generalization of their classical counterparts. 

Theorem 1 {Lebesgue's bounded convergence theorem ). Let {/J be a 
sequence of functions converging to a limit f on A, and suppose 

l/«MI < <PM (xeA,n = 1,2 ,...), 

where <p is integrable on A. Then f is integrable on A and 

lim [ /„M dp. = j f{x ) dp. 

n->oo JA A 

Proof. Clearly |/(x)| < <p(x), and hence /is integrable, by Theorem 3, 
p. 297. Let 

A k = {x:k — 1 < <p(x) < k}, 

B m = U A k = (x:cp(x) > m}. 

Jc^m 

By Theorem 4, p. 298, 

I (f{x) dp = 2 I Y ?M 

k JA * 


( 1 ) 
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where the series on the right is absolutely convergent. By the same token, 


I _ <PM dy = 2 f <pM dy. 


J B m 7 c>m 

Given any e > 0, there is an integer m such that 

L ?M dy < - , 

'’Bm Z> 

since the series (1) converges. Moreover, <p(x) < m on A — B m . By 
Egorov’s theorem (Theorem 12, p. 290), A — B m can be represented in 
the form 

A-B m = CU D, 
where {/„} converges uniformly to/on C and 


Let N be such that 


on C if n > N. Then 


yiD)< 


I AM -/Ml < ; 


jjf M -fix)] dp. = j Bm fn(x) dy - dy + fix) dy 

- j f) fix) dy + jjfnix) -fix)] dy. 


and hence 


j A fix) - j A fix) dy = jjfnix) -fix)] dy 

< j B I/.Ml dy + j B Jfi x )\ dy + jjfnix)] dy 
+ J I/Ml dy + jjfnix) -/Ml dy 


.e , e , e , e e 

<t ~ |- m H--h “ ■ Kw — s, 

5 5 5m 5m 5p,(C) 


which implies (1), since s > 0 is arbitrary. [ 
Corollary. If |/„(x)| < M andf n -*-f, then 


f. fnix) dy = j fix) dy. 

J A 


Proof. Choose y(x) = A/ noting that every constant is integrable 
on A. 1 
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Remark. The values taken by a function on a set of measure zero have 
no effect on its integral. Hence in Theorem 1 we need only assume that {/„} 
converges to / almost everywhere and that the inequality |/ n (x)| < <pM 
holds almost everywhere. 

Theorem 2 (Levi). Suppose 

fix) < fix) < • • • < fix) <■■■ 

on a set A, where the functions f are all integrable and 

f fix)dy<M (n — 1,2,...) (2) 

JA 

for some constant M. Then the limit 

fix) =lim/„M 

n~* oo 

exists (and is finite) almost everywhere on A. 5 Moreover, f is integrable 
and 

I'm f fix) dy = j f(x) dy. 

71 —* OO 

Proof. It can be assumed that fix) > 0, since otherwise we need 
only replace the f by f — f. Let 

Q = {x:xe A, fix) -*■ co}. 

Then clearly 

Q = nu o'/, 

r n 

where 

G/ = [x:x e A, fix) > r}. 

It follows from (2) and Chebyshev’s inequality (Theorem 5, p. 299) that 


Moreover 


M 

< — • 
r 


q(UO/ 

)<*, 

since 

\ » 

) r 

But 

O/ <= 0' r> c • • 

• <= G/ 


O c= LJ O/ 

n 


5 The function / can be defined in an arbitrary way on the set E where the limit (2) 
fails to exist, for example, by setting/(x) = 0 on E. 
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for any r, and hence 

fx(D) < — . 
r 

Since r can be arbitrarily large, this implies 

p(O) = 0, 

thereby showing that the sequence {f n {x)} has a finite limit f(x) for 
almost all x e A. 

Now let 

A r = {x:r - 1 </(x) < r}, 
and let <p be the simple function such that 

<p(x) = r if x e A r (r = 1,2,...). 

Moreover, let 

B s = U A r . 

r=l 

Since the functions/„ and/ are bounded on B s and since 

<P (x) < f(x) + 1, 

we have 

j B 9 (x) dp < j B f(x) dp + p{A) 

= lim f f n {x) dp + p(A) < M + p(T), 

n~* co "s 

where we use the corollary to Theorem 1. But 
[ cp(x) dp = 2 rp(A r ), 

' / i?s r—1 

and hence 

2n*(A r ) < M + fx(^) 

r=l 

for all s = 1,2,. .. . Therefore 

oo 

1rp(A r ) < oo, 

r=l 

i.e., cp is integrable on A, with integral 

J » 00 

,<P(*) d ( i. = ^ rp(/l r ). 

A r=l 

Since f n (x) < tp(x), the validity of (3) is now an immediate consequence 
of Lebesgue’s bounded convergence theorem (Theorem 1). g 


Corollary. If cp/x) > 0 and 
00 » 

2 \ t 9 k(x)d[j. < co, 

k =i Ja 

then the series 

00 

2 %(*) 

converges almost everywhere on A and 

2 /,%(*) ^ = f Y 2 %(*)) 

fc=l ^ ' A \*=l / 

Proof. Apply Theorem 2 to the functions 

n 

fn(x) = 2 %(*)• 
k=l 

Theorem 3 ( Fatou ). Let {/„} a sequence of nonnegative functions 

integrable on a set A, such that 

\ A fn(x)dp< M (n= 1,2,...). 

Suppose {/„} converges almost everywhere on A to a function f Then f is 
integrable on A and 

f /(x) dp < M. 

JA 

Proof Let 

9«0) = inf/ fc (x). 

Then <p„ is measurable, since 


Moreover 


{x: cp B (x) < c} = U {x:f k (x) < c}. 


0 < <p n (x) </„(x), 

and hence cp M is integrable, by Theorem 3, p. 297, with 

f 9 n(x) dp < f /„(x) dp < M (n = 1,2,...). 

JA J A 


Clearly 

and 


<PiW < ?*(*) < ’ • ' < ?«(*) < 

lim <p n (x) =/(x) 


almost everywhere. Applying Theorem 2 to the sequence {<p„}, we find 
that/is integrable and 

J /(x) dp = lim f^<p„(x) dp < M. I 
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30.2. The Lebesgue integral over a set of infinite measure. So far all our 
measures have been finite (except for Remark 3, p. 267), and hence everything 
said about the Lebesgue integral and its properties has been tacitly understood 
to apply only to the case of functions defined on sets of finite measure. 
However, one often deals with functions defined on a set X of infinite measure, 
for example, the real line equipped with ordinary Lebesgue measure. We 
will confine ourselves to the case of greatest practical interest, where X can 
be represented as a union 

x = U X n , p(X J < oo (3) 

11 

of countably many sets X„, each of finite measure with respect to some 
cr-additive measure p defined on a cr-ring of subsets of X (the sets of finite 
measure). Such a measure is called a-finite. For example, Lebesgue measure 
on the line, in the plane, or more generally in n-space is c-finite. For 
simplicity, and without loss of generality (why?), we will assume that the 
sequence { X n } is increasing, i.e., that 

X 1 <= X 2 e • • • <= X n c • • • . (4) 

A sequence {XJ satisfying the conditions (3) and (4) will be called exhaustive. 
For example, the sequence {E n } in Remark 3, p. 267 is an exhaustive sequence 
(with respect to ordinary Lebesgue measure), whose union is the whole 
plane. 

Now let/be a measurable function on X. 6 Then/is said to be integrable 
(or summable) on X if it is integrable on every measurable subset A <= X 
and if the limit 

lim L /(*) (5) 

n-* GO J -Xn 

exists (and is finite) for every exhaustive sequence {X n }. The limit (5) is then 
called the ( Lebesgue ) integral of/ over the set X, denoted by 

J x /(*) dp. 

Remark 1. The limit (5) is independent of the choice of the exhaustive 
sequence {X n }. In fact, suppose 

lim /OO d\L ^ lim [ f(x) dy., 

n-* CO n-> oo ^ 


6 A real function y = fix) is now said to be measurable if the set f~\A) O X„ is 
measurable for every X„ and every Borel set A (this being the obvious slight generalization 
of Definition 1, p. 284). 


SEC. 30 


where {X*} is another exhaustive sequence. Define a new sequence {Q n } 
such that 

O, = X u 

Q. 21c is any set of {X*} containing 
Q 2J . + , is any set of {X n } containing Q 2i , 

(why do such sets exist?). Then {D n } is exhaustive, but 

lim L /00 d V- 

M —► CO 

fails to exist, contrary to hypothesis. 

Remark 2. The integral of a simple function is defined in the same way 
as on p. 294. It is clear that a necessary (but not sufficient) condition for 
integrability of a simple function / is that/take every nonzero value on a set 
of finite measure. 


30.3. The Lebesgue integral vs. the Riemann integral. Finally we examine 
the relation between the Lebesgue integral and the Riemann integral, 
restricting ourselves to the case of ordinary Lebesgue measure on the line: 

Theorem 4. If the Riemann integral 

I = P f(x) dx 

exists, then f is Lebesgue integrable on [a, b ] and 

J f(x) dy = I. (6) 

Proof Introducing the points of subdivision 

x k = a + ^ (b — a) (k = 1.2”), 


we partition [a, b ] into 2” subintervals. Let 
^ n = b -=~lM nk , 

2 k=i 


b — a L 

aL m nk 

^ 7c=l 


be the corresponding Darboux sums, where M nk is the least upper bound 
and m nk the greatest lower bound on/on the subinterval x h _ x < x < x k . 
By the definition of the Riemann integral, 

I = lim A n = lim S„. 

n~* oo n-*<xj 
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Consider the functions 

A O) = M nk 

f n (x) = m nk 


fn(h) =/.(*) =/(*). 


Then clearly 


( fi x ) dy = A„, 


Moreover, 


flix) > fix) > • 

> fix) > ■ ■ > fix), 


fix) <fix) <■■■ 

< fnix) <■ ■ < fix). 

and hence 




lim fix) =f(x) > f{x), 

n~* oo 


lim f n {x) = /(x) </(x). 

oo“ 

Using (7) and Theorem 2, we find that 

L . i /(*) ^ = lim f /„ (x) dp. = lim A„ = I 

= lim s„ = lim f f n (x) dy = f /(x) dp (8) 

w-»oo co ^L«> oJ - J [a,b] - 

(see also Problem 2). Therefore 

f [a iJ I /(*) -f(x)\ dy = ^ {/(x) -/(x)} dp = 0, 

and hence 

/(x) -/(x) = 0 

almost everywhere, by the corollary on p. 300. In other words, 

/(*) =f(x) =/(x) (9) 

almost everywhere. Comparing (8) and (9), we get (6). | 

Problem 1. Prove that 

lim f f n (x)g(x) dy = [ /(x) dp.(x) 

«->oo JA JA 

if the sequence {/„} satisfies the conditions of Theorem 1 (as stated more 
generally in the remark on p. 305) and if g is essentially bounded on A in 
the sense that there is a constant M > 0 such that |g(x)| < M almost every¬ 
where on A. 


if x k _ x < x < x k , 
if x M < x < x k , 


Comment. If g is essentially bounded on A, then the quantity 
ess sup ]g(x)| = inf sup |g(x)|J, 

xeA ZC1A {xeA-Z ) 

r ( Z )=0 

called the essential supremum of g on A, is finite. 

Problem 2. Prove that Theorem 2 remains valid if 
fix) > fix) > ■■■> fix) > ■ 

and if (2) is replaced by the condition 

\ A fi x )dy> M in = 1,2,...). 

Problem 3. Consider the system SP of all subsets of the real line con¬ 
taining only finitely many points, and let the measure y(A) of a set A e £P 
be defined as the number of points in A. Prove that 

a) £P is a ring without a unit; 

b) p is not ofinite. 

Problem 4. Why do we talk about a o-ring rather than a cr-algebra on 
p. 308? 

Problem 5. Prove that if a function / vanishes outside a set of finite 
measure, then its Lebesgue integral as defined on p. 308 coincides with its 
Lebesgue integral as previously defined. 

Problem 6. Show that the analogue of the definition on p. 296 cannot be 
used to define the Lebesgue integral in the case where A is of infinite measure. 

Hint. Give an example of a uniformly convergent sequence {/„} of 
integrable simple functions such that 

lim f fix) dy 

fails to exist. 

Problem 7. Which of the theorems of Sec. 29 continue to hold for 
integrals over sets of infinite measure ? 

Hint. The corollary on p. 298 fails if A is of infinite measure. 

Problem 8. Verify that Theorems 1-3 of Sec. 30.1 continue to hold for 
integrals over sets of infinite measure. 

Problem 9. Given a nonnegative function/, suppose the Riemann integral 

P fix) dx 

Ja+s 
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exists for every s > 0 and approaches a finite limit as s ■ 
improper Riemann integral 


P/O) dx 

da 

exists. Prove that/ is Lebesgue integrable on [a, b ] and 


= lim P f(x) dx 
s -.o + ja+E 


■ 0+, so that the 


( 10 ) 


[ fix) dii = f 6 f(x ) 

dfa.o ] Ja 


dx. 


Comment. On the other hand, if/is of variable sign and if 

\da- 


lim 1/0)1 dx 

S-.+0 , ' a + E 


00 , 


fll . 1 , 
sin - dx 
J ° x x 


then the Lebesgue integral of/over [a, b\ fails to exist, even if the improper 
Riemann integral (10) exists. In fact, by Problem 5, p. 302, summability 
of/ would imply that of |/|. 

Problem 10. Prove that the integral 

f 

Jo 

exists as an improper Riemann integral, but not as a Lebesgue integral. 

Problem 11. Suppose / is Riemann integrable over an infinite interval 
(such an integral can exist only in the improper sense). Prove that / is 
Lebesgue integrable over the same interval if and only if the improper 
integral converges absolutely. 

Comment. For example, the function 

. sin x 

fix) = - -- 

x 

is not Lebesgue integrable over (— oo, co), since 

dx = oo. 


f« 

sin * 

J — CO 

X 


On the other hand,/has an improper Riemann integral equal to 

/: ' 


foo sin x 


9 


DIFFERENTIATION 


Let / be a summable function defined on a space X, equipped with a 
c-additive measure a. Then the (Lebesgue) integral 

j E fix) dii (1) 

exists for every measurable E <= X, thereby defining a set function on the 
system of all measurable subsets of X. If X is the real line, equipped 
with ordinary Lebesgue measure p, and if E = [a, b] is a closed interval, we 
write (1) simply as 

J7(*) dx, 

da 

or equivalently as 

[7(0 dt (2) 

da 

in terms of the new dummy variable of integration t (here we anticipate 
subsequent notational convenience). Then (2) is clearly a function of the 
lower limit of integration a and the upper limit of integration b. Suppose we 
fix a, but leave b variable, indicating this by replacing b by the symbol x. 
Then (2) reduces to the “indefinite Lebesgue integral” 

J"7(0 dt, 

d a 

with its upper limit of integration variable. 

Now let / be continuous, and let F have a continuous derivative. Then 
it will be recalled from elementary calculus that the connection between 


313 
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the operations of differentiation and integration is expressed by the familiar 
formulas 

-f f7(0 *=/(*), 0) 

ax Ja 

\ X F'(t ) df = F(x) — F(«). (4) 

Jo 

This immediately suggests two questions: 

1) Does (3) continue to hold for an arbitrary summable function/? 

2) What is the largest class of functions for which (4) holds ? 

These questions will be answered in Secs. 31-33. The study of the general 
set function (1) will be resumed in Sec. 34. 


31, Differentiation of the Indefinite Lebesgue Integral 

31.1. Basic properties of monotonic functions. We begin our study of the 
indefinite Lebesgue integral 

F(x) = / o 7(0 dt (1) 

as a function of its upper limit by making the following obvious but important 
observation. If / is nonnegative, then (1) is a nondecreasing function. 
Moreover, since every summable function f(t) is the difference 


(provided it exists) is called the right-hand limit of f at the point x 0 , 
denoted by 

f(x 0 + 0). 

Similarly, the limit 


lim f(x 0 — e) 

e~*0 

e>0 


is called the left-hand limit of f at x 0 , denoted by 

fix o - 0). 

Remark. If 

fix o + 0) =f(x 0 - 0), 

then clearly / is either continuous at x 0 or has a removable discontinuity 
at x 0 . 


Definition 3. A function f is said to be continuous from the right at 
x 0 if 

fix 0 ) = fix o + 0), 

and continuous from the left at x 0 if 

fix o) =/(*o - 0). 

Definition 4. By a discontinuity point of the first kind of a function f 
is meant a point x 0 at which the limits f(x 0 + 0) andf (x 0 — 0) exist but are 
unequal. The difference 

fix 0 + 0) -f(x 0 - 0) 


fit ) =/+(0 -f-it) 


is then called the jump of f at x 0 . 


of two nonnegative summable functions (which?), the integral (1) is 
the difference between two nondecreasing functions. Hence, the study of the 
Lebesgue integral as a function of its upper limit is closely related to the 
study of monotonic functions. Monotonic functions are interesting in their 
own right, and have a number of simple and important properties which 
we now discuss. Here all functions will be regarded as defined on some 
fixed interval [a, b ] unless the contrary is explicitly stated. 

Definition 1, A function f is said to be nondecreasing if x 1 < x 2 
implies f(xj) < /(x 2 ) and nonincreasing if x x < x 2 implies f{xf) > /(x 2 ). 
By a monotonic function is meant a function which is either nondecreasing 
or nonincreasing. 

Definition 2. Given any function f the limit 
lim f(x 0 + s) 

e-*0 

s>0 


Example. Given no more than countably many points 

X\, x 2 , ■ • • , x n ,. . . 

in the interval [a, b], let 

“i, "2> • • • > h n ,. . . 

be corresponding positive numbers such that 

2 K < 

n 

Then the function 

fix) = 2 K, (2) 

x n <x 

where the sum is over all n such that x n < x, is obviously nondecreasing. 
A monotonic function of this particularly simple type is called a jump 
function. A jump function such that 


*1 < *2 ‘ > 
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is called a step function. For an example of a jump function which is not 
a step function, see Problem 1. 

We now establish the basic properties of monotonic functions. To be 
explicit, we will talk about nondecreasing functions, but clearly everything 
carries over automatically to the case of nonincreasing functions. 

Theorem 1. Every nondecreasing function f on [ a,b ] is measurable 
and bounded, and hence summable , 1 

Proof. Since/(x) < f(b) for all x e [a, b], /is obviously bounded. 
Consider the set 

E c = {x :/(x) < c}. 

If E c is empty, then E c is (trivially) measurable. If E c is nonempty, let 
d be the least upper bound of all x e E c . Then E c is either the closed 
interval [c,d], if deE 0 , or the half-open interval [a, d) if d j E c . In 
either case, E c is measurable, g 

Theorem 2. Every discontinuity point of a nondecreasing function is 
of the first kind. 

Proof. Let x be any point of [a b], and let {x,} be any sequence 
such that x n < x 0 , x n —► x 0 . Then {/(x n )} is a nondecreasing sequence 
bounded from above, e.g., by the number/(x 0 ). Therefore lim/(xj 

W-»-co 

exists for any such sequence, i.e., /(x 0 — 0) exists. The existence of 
/(x 0 + 0) is proved in the same way. g 

Obviously, a nondecreasing function need not be continuous. However, 
we have 

Theorem 3. A nondecreasing function can have no more than countably 
many points of discontinuity. 

Proof. The sum of the jumps of/on the interval [a, b] cannot exceed 
f(b) — fid). Let be the set of all jumps greater than Ijn, and let/be 
the set of all jumps regardless of size. Then obviously 

00 

J = \JJn , 

n—l 

where each J n is a finite set. Hence J has no more than countably many 
elements. 1 

Theorem 4. The jump function (2) is continuous from the left. More¬ 
over , all the discontinuity points off are of the first kind, with the jump at x n 
equal to h n . 
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Proof. Clearly, 

/(x - 0) = lim/(x - e) = lim J h n . 

s-+0 e-*0 x n <x—e 

s >0 s>0 

But if x n < x, then x n < x — s for sufficiently small s > 0. Therefore 

lim 2, h n =/(*), 

S-»0 Xn < X —s 
£>0 

and hence 

f{x - 0 ) = fix). 

If x coincides with one of the points x„, say with x„ , then 

/(*«. + 0 ) = lim/(x „ 0 -f s) = lim 2 h n = 2 K> 

S'* 0 £ -°0 x n <xn 0 +e X„<x„ a 

which implies 

A**. + 0) ~f(x n „ - 0) = h v 1 

Theorem 5. If f is continuous from the left and nondecreasing, then 
f is the sum of a continuous nondecreasing function <p and a jump func¬ 
tion (ji. 

Proof. If x x , x 2 ,... are the discontinuity points of /, with corre¬ 
sponding jumps h x , h 2 , . . . , let 

<K*) = 2 K> 

X n <X 

?(x) =f(x) - <P(x). 

Then 

<p(x") - fix') = [fix") -fix')] - [<Kx") - «K*0], 

where the expression on the right is the difference between the total 
increment of / on the interval [x' , x"] and the sum of its jumps on 
\x', x"], i.e., <p(x") — 9 (x') is the measure of the set of values taken by 
/at its continuity points in [x', x"]. This quantity is clearly nonnegative, 
and hence 9 is nondecreasing. Moreover, given any point x e [a, b ], we 
have 

9 (x — 0 ) = lim fix — e) — lim d/(x — e) — /(x — 0 ) — h n , 

e-*0 e-> 0 a;„<te 

E > 0 E > 0 

9 (x + 0 ) = lim/(x + s) — lim i)j(x + s) = /(x + 0 ) — £ h„, 

£-*0 

£ > 0 £ > 0 

and hence 

<p(x + 0) — 9 (x — 0) = fix + 0) — fix — 0) — h = 0, 

where h is the jump of 'Jj at x. It follows that 9 is continuous at every 
point x e [a, b], g 


1 See the corollary on p. 298. 
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31.2. Differentiation of a monotonic function. The key result of this 
section (see Theorem 6 below) will be to show that a monotonic function f 
defined on an interval [a, b] has a finite derivative almost everywhere on \a, b]. 
Before proving this proposition, due to Lebesgue, we must first introduce 
some further definitions and then establish three preliminary lemmas. 

The derivative of a function f at a point x 0 is defined in the familiar way 
as the limit of the ratio 

/(*) ~/(*o) . 3) 

x — x 0 

as x 0 . Even if this limit fails to exist, the following four quantities 
(which may take infinite values) always exist: 

1) The lower limit of (3) as x x 0 from the left, denoted by X L ; 

2) The upper limit of (3) as x -> x 0 from the left, denoted by A L f 

3) The lower limit of (3) as x-> x 0 from the right, denoted by X R ; 

4) The upper limit of (3) as x -+ x 0 from the right, denoted by A R . 

These four quantities, with the geometric meaning shown in Figure 17, are 
called the derived numbers of/ at x„. 3 It is clear that the inequalities 

< -A-£> X r < A r (4) 

always hold. If X L and A L exist and are equal, their common value is just 
the left-hand derivative of / at x 0 . Similarly, if X R and A R exist and are 
equal, their common value is just the right-hand derivative of/at x„. More¬ 
over,/has a derivative at x 0 if and only if all four derived numbers X L , A L 



2 Upper and lower limits are defined on p. 111. 

3 To distinguish these quantities further, we can call \ L the left-hand lower derived number, 
A x the right-hand upper derived number , and so on. 


Xj { and A r exist and are equal at x 0 . Hence the italicized assertion at the 
beginning of this section can be restated as follows: For a monotonic function 
defined on an interval [a, b ], the formula 

oo < X L = A l = X R = A r < + co 
holds almost everywhere on [a, b]. 

Definition 5. Let f be a continuous function defined on an interval 
[a, b], Then a point x 0 e [a, b] is said to be invisible from the right (with 
respect to /) if there is a point \ such that x 0 < i; < b andf(x 0 ) <m, 
and invisible from the left if there is a point £, such that a < x 0 and 

/(*o) </(5). 

Example. In Figure 18, the points belonging to the intervals [a u bf) and 
(a 2 , b 2 ) are invisible from the right (interpret the word “invisible”). 

Lemma 1 (F. Riesz). The set of all points invisible from the right with 
respect to a function f continuous on [a, b ] is the union of no more than 
countably many pairwise disjoint open intervals ( a k , bj),* such that 

f(a k ) <f(b k ) (k= 1,2,...). (5) 

Proof. If x 0 is invisible from the right with respect to /, then the 
same is true of any point sufficiently close to x 0 , by the continuity of/. 
Hence the set of all points invisible from the right is an open set G. It 
follows from Theorem 6, p. 51 that G is the union of a finite or countable 
system of pairwise disjoint open intervals. Let ( a k , bf) be one of these 
intervals, and suppose 

f(afi>f(b k ). (6) 



4 However, if a t — a (say), then in some cases (a u bd should be replaced by the half¬ 
open interval [a lt t>i), as in Figure 18. This is permissible, since [a u bd 'is open relative to 
la, bl 
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Then there is an (interior) point x„ e (a k , b k ) such that f(x 0 ) > f(b k ). 
Of the points x e ( a k , b k ) such that fix) = f(x 0 ), let x* be the one with 
largest abscissa (x* may coincide with x 0 ). Since x* belongs to (a k , b k ) 
and hence is invisible from the right, there is a point 2 > x* such that 
/(2) > fix*). Clearly 2 cannot belong to («*, b k ), since x* is the point 
x with largest abscissa for which /(x) = /(x 0 ), while / (b k ) < f(x 0 ), so 
that 2 e (a k , b k ) would imply the existence of a point x > x* such that 
fix)—fix o). On the other hand, the inequality 2 > b k is also im¬ 
possible, since it would imply f(b k ) </(x 0 ) < /(2) despite the fact that 
b k is not invisible from the right. Thus (6) leads to a contradiction 
(obviously 2 # b k ). It follows that f(a k ) <fib k ). | 

Lemma 1 '. The set of all points invisible from the left with respect to 
a function f continuous on [a, b] is the union of no more than countably 
many pairwise disjoint open intervals (a k , b k ), such that 

f(a k )>fib k ) Qc= 1 , 2 ,...). 


Let G k be the set of points in (<x k , (3 A ) for which A B > C. Then, by 
virtually the same argument together with Lemma 1, G k is the union of 
no more than countably many pairwise disjoint open intervals (a fol , 3 ; .J, 
where 

<^[/(PJ“/K)] (8) 

(why ?). Clearly E p n (a, (3) is covered by the system of intervals (a, Cn , (3 4n ) 
Moreover, it follows from (7) and (8) that 

k,n C k,n 

< "2 Ifih) -fix,)] < £ 2 (P* - «*) < p((3 - a). 1 

C Jr C k 

We are now in a position to prove 


Proof. Virtually the same as that of Lemma 1. | 

Lemma 2. Let fbe a continuous nondecreasing function on [a, b ], with 
\ L and A r as two of its derived numbers. Given any numbers c, C and p 
such that 


let E p be the set 
Then 


0 < c < C < oo, 


P 


c_ 

C’ 


•Ep = {xA l <c,A b > C}. 


[j.{x:x eE n i a, (3)} < p((3 — a) 

for every open interval (a, (3) <= [a, b ]. 

Proof. Let x 0 be a point of (a, (3) for which \ L < c. Then there is a 
point l<x such that 


i.e., such that 


m -.fix o) 

/(2) - C >/(x„) - cx 0 . 


Therefore x 0 is invisible from the left with respect to the function 
f{x) — cx. Hence, by Lemma 1', the set of all such x 0 is the union of 
no more than countably many pairwise disjoint open intervals (a,., [3 ; .) <= 
(a, (3), where 

/( a s) - > /((3j.) - c(3^, 


jm -f(«*) < c(P* - a,). 


(7) 


Theorem 6 iLebesgue ). A monotonic function f defined on an interval 
[a, b ] has a finite derivative almost everywhere on [a, 6], 

Proof. There is no loss of generality in assuming that / is non¬ 
decreasing, since if f is nonincreasing, then obviously —/ is nondecreas¬ 
ing. But if —/has a derivative almost everywhere, then so does/. We 
also assume that / is continuous, dropping this restriction at the end of 
the proof. It will be enough to show that the two inequalities 

A b < + co (9) 

and 

^ > A a (10) 

hold almost everywhere on [a, b J, for any continuous nondecreasing 
function. In fact, setting/*(x) = —/(—x), we see that/* is continuous 
and nondecreasing, like/itself. Moreover, it is easily verified that 

= \b> AJ = A l , 

where A* and A* R are the indicated derived numbers of/*. Therefore, 
applying (10) to/*, we get 

* 1 >< 

or 

X R > A l . ? (11) 

Combining the inequalities (10) and (11), we obtain 

A n < /l < A L < ~b-R < A r , 


or equivalently 
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after using (4). Thus if (9) and (10) hold almost everywhere, we have 5 

— oo < X L = A l — X R = A b < + co 

almost everywhere, and the theorem is proved. 

To prove that < + co almost everywhere, we argue as follows: 
If A r = + oo at some point x 0 , then, given any constant C > 0, there is 
a point ? > x 0 such that 

/(£)-/(+,) „ c 
l - *o " ’ 

m -/(*<>) > C(5 - **), 

/(£) - Cq >/(*„) - Cx 0 . 

Thus x 0 is invisible from the right with respect to the function/(x) — Cx. 
Hence, by Lemma 1, the set of all points x 0 at which A R = +oo is the 
union of no more than countably many open intervals (a k , b k ), whose 
end points satisfy the inequalities 

/(«*) - Ca k < f(b k ) - Cb k 
or 

f(b k ) -/(«*) > C{b k - a k ). 

Dividing by C and summing over all the intervals {a k , b k ), we get 


i.e., 

or equivalently 


2 ( b k ~ a k ) < 

Jc 


v f(K) -f( a k ) : 


.fib) ~f(a) 
C 


But C can be made arbitrarily large. Hence the set of points where A R = 

"T co can be covered by a collection of intervals the sum of whose lengths 
is arbitrarily small. It follows that this set is of measure zero, i.e., that 
A r < +oo almost everywhere. 

To prove that X L > A R almost everywhere, let the numbers c, C, 
p and the set E p be the same as in Lemma 2. It will then follow that 
X L > A r almost everywhere if we succeed in showing that [fiEfi = 0, 
since the set of points where X L < A R can clearly be represented as the 
union of no more than countably many sets of the form E (why?). 
Let [i(E p ) = t. Then, given any s > 0, there is an open set G, equal 
to the union of no more than countably many open intervals ( a k , b k ) 
such that E 9 <= G and 

1 (b k — a k ) < t + s 

- Jr 

5 Note that A s cannot equal — oo, since the difference quotient (3) is inherently non¬ 
negative if /is nondecreasing. 
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(this follows from the very definition of Lebesgue measure on the line). 

If 

4 = l4£p n (a k , b k )\, 

then 

1 = 2 4 

k 

But 4 < g(b k — a k ), by Lemma 2. Hence 

t < P J,(b k — a k ) < p (t + e), 

k 

which implies t < pi, since s > 0 is arbitrary. This in turn implies 
1 = 0, since 0 < p < 1. Therefore \ L > A n almost everywhere, as 
asserted. 

Finally, to drop the requirement that/be continuous, we need only 
generalize Lemmas 1 and 1' in the way indicated in Problem 6, noting that 
the proof continues to go through (check details). 6 | 

Remark. Despite its apparent complexity, the proof of Theorem 6 is 
based on simple intuitive ideas. For example, the finiteness of A R (and A L ) 
almost everywhere is easily made plausible. In fact, let /be continuous and 
nondecreasing on [a, b ]. Then/maps [a, b] into the interval [/(a), f(b)], at 
the same time subjecting a small interval [x, £] at x to a “magnification” 
approximately equal to 

l-x 

But the interval [f(a),f{b)] is finite, and hence y(x) cannot be infinite on a 
set of positive measure. As for the part of the proof based on Lemma 2, 
it merely says that if the intersection of a subset A c [a, b ] with every interval 
(a, P) has measure no greater than o(p — a) for some fixed number p < 1, 
then A cannot have positive measure. 

31.3. Differentiation of an integral with respect to its upper limit. Returning 
to the problem of differentiating the indefinite Lebesgue integral, we have 

Theorem 7. Let f be any function summable on [a, b\. Then 


;/7(0 dt 


exists and is finite for almost all x. 

Proof. As noted at the beginning of Sec. 31.1 

_ f(t)=U(t)-f-(t), 

6 For an alternative proof, see Problems 7-9. 
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where f + and/_ are nonnegative summable functions, so that 
f(x) = |7 (/)dt = /7 + (o dt- f*/:(o* = f,w-f 2 (x) 

Ja Ja 

is the difference between two nondecreasing functions F\ and F 2 . But F, 
and F 2 have finite derivatives almost everywhere, by Theorem 6, and 
hence so does F. g 

We now evaluate the derivative (12), thereby giving an affirmative answer 
to the first of the two questions posed on p. 314: 

Theorem 8. Let f be any function summable on [ a , b]. Then 

r S x dt =/« 

dx 

almost everywhere. 

Proof. Let 

F(X) = /7(0 dt. 

Ja 

Then it will be enough to show that 

f{x) > F\x) (13) 

almost everywhere for any summable function. In fact, changing f(x) 
to —f(x) in (13), we get 

-fix) > -Ffx) 

and hence 

fix) < F'ix). (14) 

But (13) and (14) together imply the desired result 

fix) = F'ix) = -j- j"fit) dt 
dx Ja 

(almost everywhere). 

To prove (13), we observe that if 

fix) < F'ix), 

then there are rational numbers a and [3 such that 

fix) < a < (3 < F'ix). (15) 

Let i? a p be the set of all x satisfying (15). Then, as we now show, 
\x(E c t(J ) = 0. Since the number of sets is countable, this will imply 

p.{x:/(x) < F'ix)} = 0 

and hence that (13) holds almost everywhere. 
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To prove that ^(F^) = 0, we first note that, given any e > 0, there is 
a S > 0 such that {i(E) < 8 implies 

\ K f(t) dt < e 

(the existence of such a number 8 follows from the absolute continuity 
of the Lebesgue integral, proved in Theorem 6, p. 300). 7 Let G <=■ [a, b ] 
be an open set, made up of no more than countably many pairwise 
disjoint open intervals ia k , b k ), such that 

F ag c G, n(G) < fx(F aP ) + 8, 

and let x 0 be any point in G k = E a(i n ia k , b k ). Then 


m - f(x o) ^ p 
5 — 


(16) 


for any point \ > x 0 sufficiently close to x 0 . Writing (16) in the form 


m F(xo) - 


we see that the point x 0 is invisible from the right with respect to the 
continuous function Fix) — (3x. It follows from Lemma 1 that G k is 
the union of no more than countably many pairwise disjoint open 
intervals (a kn , b k J, where 

Fia k f - ?>a kn < Fib k J + $b K , 

i.e., 

Fih) - Fia K ) > [fb K - a kn ), 

or equivalently 

/V(0 dt > p ib kn - aj. (17) 

Ja k„ 

If 

5 — U ia kn , b kn ), 

k,n 

then clearly 

G, (i(5) < |i(F a 3 ) + 8. 

Summing (17) over all the intervals ia kn , b k J, we get 


L/W dt -I J >/(0 dt > P 2 (b kn - a k J = P(1(S). 
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On the other hand, 

f S f(o dt = 4x3 f(t) dt + 4-i^/W dt 

< a(x(£ a p) + e < ap,(Sj + |a| S + e. 
Comparing (17) and (18), we get 

a^OS) + |a| § + s > PpOS) 
or 


KS)< 


|a| S + s 
P-a 


(18) 


Therefore E a r. is contained in an open set of arbitrarily small measure (it 
can be assumed that |a| 8 < s). It follows that (j/jE’^) = 0. | 

Problem 1. Let x u x 2 ,.. . , x n ,. . . be the set of all rational points in 
[a,b], enumerated in any way, and let h n = 1/2". Prove that the jump 
function 

fix) = X h n 

X n <X 

is discontinuous at every rational point and continuous at every irrational 
point. 

Problem 2. Suppose we define a jump function by the formula 

fix) = X. h »> ( 19 ) 

rather than by the formula (2). Prove that / is continuous from the right, 
rather than from the left as in Theorem 4. 


Problem 3. Find the derived numbers of the function 

( x sin - if x > 0, 

x 

0 if x < 0 

at the point x = 0. 

Problem 4. Find the points invisible from the left in Figure 18, p. 319. 

Problem 5. In Lemma 1, show that f(a k ) = f(b k ) if a k ^ a. 

Problem 6. Prove that the requirement that /be continuous on [a, b ] can 
be dropped in Lemma 1, provided that 

1) The discontinuity points of/are all of the first kind; 

2) A point x 0 e [a, b] is said to be invisible from the right (with respect 
to f) if there is a point £, such that x 0 < E, < b and 

max {/(x 0 - 0),/(x 0 ),/(x 0 + 0)} </(£); 


3) The inequality (5) is replaced by 

f(a k + 0) < max {j\b k - 0 ),f{b k ),f{b k + 0)}. 

State and prove the corresponding generalization of Lemma 1'. 

Problem 7. Let « 

X 9n(x) — fix) (20) 

n=0 

be an everywhere convergent series, whose general term <p„(x) is nondecreasing 
(alternatively, nonincreasing) on [a, b]. Prove that (20) can be differentiated 
term by term almost everywhere, i.e., that 

co 

X f'nix) = fix) 

almost everywhere. "~° - 

Problem 8. Prove that every jump function has a zero derivative almost 
everywhere. 

Hint. Use Problem 7. 

Problem 9. Prove that the assumption that f be continuous from the left 
in Theorem 5 can be dropped if we define a jump function as a sum of a 
“left jump function” like (2) and a “right jump function” like (19). Use 
this fact and Problem 8 to complete the proof of Theorem 6 without recourse 
to Problem 6. 

Hint. Use Problem 8 and Theorem 5. 

Problem 10. Following van der Waerden, let 

\ x if 0 < x < |, 

?o(*) = 

u - x if \ < X < 1, 

and continue cp 0 by periodicity, with period 1, over the whole x-axis. Then 
let 

9n( x ) = ~ <Po(4"x) («=1,2, ...), 

CO 

fix) = X 9nix) 

n —0 

Prove that 

a) The function / is continuous everywhere; 

b) The derivative of/fails to exist at every point x 0 e (— oo, oo). 

Hint. Consider the increments 

f ( x o ± 4) 
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32. Functions of Bounded Variation 

The problem of differentiating a Lebesgue integral with respect to its 
upper limit has led us to consider functions that can be represented as 
differences between two monotonic functions. We now give a different 
description of such functions (independent of the notion of monotonicity), 
afterwards studying some of their properties. 

Definition 1. A function f defined on an interval [a, b] is said to be 
of bounded variation if there is a constant C > 0 such that 

2 !/(**) -/0&-i)l < C (1) 

i-=i 

for every partition 

a — x a < x x < • • ■ < x n — b (2) 

of [a, 6] by points of subdivision x a , x u ... , x n . 

Example. Every monotonic function is of bounded variation, since the 
left-hand side of (1) equals \f(b) —f(a)\ regardless of the choice of partition. 

Definition 2. Let f be a function of bounded variation. Then by the 
total variation of f on [a, b], denoted by F£(/), is meant the quantity 

v\ (/) = sup 2 \ f(x k ) -/(**_!) |, (3) 

k =1 

where the least upper bound is taken over all (finite) partitions (2) of the 
interval [a, b ]. 

Remark 1. A function / defined on the whole real line (— oo, oo) is said 
to be of bounded variation if there is a constant C > 0 such that 


Proof For any partition of the interval [a, b], we have 

2 1/(0 + g(Xk)-f(Xk- 1) - g(**-l)l 

< 11/(0 -/(**-i)l +1 lg(0 - g(**-t)|. 

k k 

Taking the least upper bound of both sides over all partitions of 
[a, b ], and noting that 

sup {x + y:x e A,y e B} < sup {x:x e A} + sup {y.yeB}, 
we immediately get (5). | 

It follows from (4) and (5) that any linear combination of functions of 
bounded variation is itself a function of bounded variation. In other words, 
the set of all functions of bounded variation on a given interval is a linear 
space (unlike the set of all monotonic functions). 

Theorem 2. If a < b < c, then 

K(f) = K(f) + vi(f). (6) 

Proof. First we consider a partition of the interval [a, c] such that 
b is one of the points of subdivision, say x r — b. Then 

21 /(**) -/(**-i)i 

k=l 

= 21 /(**) -/(**-i)l + 2 I/(**) -/(**-i)l < vl(f) + vi(f). (7) 

Now consider an arbitrary partition of [a, c]. It is clear that adding an 
extra point of subdivision to this partition can never decrease the sum 


V b a (f) < C 

for every pair of real numbers a and b (a < b). The quantity 

Km Vl(f) 

a-* —co 

&-+CO 

is then called the total variation of/on (—co, co), denoted by Vf^if). 
Remark 2. It is an immediate consequence of (3) that 

K(*f) = W K(f) (4) 

for any constant a. 

Theorem 1. Iff and g are functions of bounded variation on [a, b], 
then so is f + g and 

v b jf+ g ) < Vl(f) + V b a (g). ( 5 ) 


2 l/(^ fc ) -/(**~i)l- 

k =1 

Therefore (7) holds for any subdivision of [a, c], and hence 

K(f) < vXf) + vi(f). (8) 

On the other hand, given any e > 0, there are partitions of the intervals 
[a,b] and [b, c], respectively, such that 

21/(*» -/W-i)I > K(f ) - f, 

i 2 

2 -f(x'U)\ > vi(f) -1. 
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Combining all points of subdivision x' t , x" k , we get a partition of the 
interval [a, c], with points of subdivision x k , such that 

2 I/O*) —/Ofc-i)l = 2 1 fix'd - f(x-_ i)| + 2 I/O") -/OOi)l 

> TO + TO - «■ 

Since s > 0 is arbitrary, it follows that 

TO > TO + TO- (9) 

Comparing (8) and (9), we get (6). | 

Corollary. The function 


v(x) = TO (10) 

z's nondecreasing. 

Proof. An immediate consequence of (6), since the total variation of 
any function of bounded variation on any interval is nonnegative. | 

Theorem 3. Let f be a function of bounded variation on [a, b], and let 
v be the function (10). Then iff is continuous from the left at a point x*, 
so is v. 

Proof. Given any s > 0, use the fact that/is continuous from the left 
to choose a 8 > 0 such that 

I/O*) — /0)l < | (11) 

whenever x* — x < 8. Then choose a partition 

a — x„ < x x < • • • < x n — x* 

such that 

Vf(f) -1 I/O*) -/O*_0l < ; • (12) 

k=l 2 

Here it can be assumed that 

x* - x„_ x < s, 

since otherwise we need only add an extra point of subdivision which can 
never increase the left-hand side of (12). It follows from (11) and (12) 
that 

vT(f) - 2 I/O*) —/0*-i)i < s, 

and hence 

vf(f) - V x r\f) < e 

a fortiori, i.e., 

»0*) - »0*-i) < e. 


But then, since v is nondecreasing, 

n(%*) — v(x) < s 

for all x such that x n __ x < x < x*. In other words, v is continuous from 
the left at x*, | 

Remark. Virtually the same argument shows that if f is continuous from 
the right at x*, then so is v. Together with Theorem 3, this shows that if 
/is continuous at x*, or on the whole interval [a, b\, then so is v. 

Theorem 4 -Iff is of bounded variation on [a, b ], then f can be rep¬ 
resented as the difference between two nondecreasing functions on [a, b]. 

Proof. Let 

v(x) = TO. 

and consider the function 

g = v-f 

Then g is nondecreasing. In fact, if x' < x", then 

g(x") - g(x') = [»(*") - v{x')] - [f(x") -f(x')). (13) 

But 

I Ax”) -fix') I < vix") - vix'), 

by the very definition of v, and hence the right-hand side of (13) is 
nonnegative. Writing 

f=v-g, 

we get the desired representation of / as the difference between two 
nondecreasing functions. 1 

Corollary 1 . Everyfunction of bounded variation has afinite derivative 
almost everywhere. 

Proof. An immediate consequence of Theorem 6, p. 321. 1 
Corollary 2. Iff is summable on [a, b ], then the indefinite integral 

= j x f(t)dt 

is a function of bounded variation on [a, b ]. 

Proof. Recall the remarks at the beginning of Sec. 9.1. 1 
Problem 1. Prove that K“(/) = 0 if and only if fix) = const on [a, b). 
Problem 2. Prove that the function 

]x a sin -4 if 0 < x < 1, 

fix) = X s 

lo if x — 0 

is of bounded variation on (0, 1] if a > (3 but not if a < (3. 
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Problem 3. Suppose / has a bounded derivative on [a, b ], so that /'(x) 
exists and satisfies an inequality |/'(x)| < C at every point x e [a, b]. Prove 
that/ is of bounded variation and 

V\{f ) < C(b - a). 

Problem 4. Prove that if / and g are functions of bounded variation on 
[a, b], then so isfg and 

Vlifg) < Vl(f) sup |g(x)| + V b a (g) sup |/(x)|. 

X X 

Problem 5. Let / be a function of bounded variation on [a, b ] such that 

/(x) > c > 0. 

Prove that 1 // is also a function of bounded variation and 



Problem 6. Prove the converse of Theorem 4. 

Problem 7. Prove that a curve 

y = f(x) {a < x < b) 

is rectifiable, i.e., has finite length, as defined in Problem 3, p. 114, if and 
only if / is of bounded variation on [a, b ]. 

Problem 8. Let/be a function of bounded variation on [a, b ]. Prove that 

ll/ll = K(f) 

has all the properties of a norm (cf. p. 138) if we impose the extra condition 

m = o. 

Comment. Thus the space L [ ° o 6] of all functions of bounded variation 
on [a, b] equipped with this norm and vanishing at x = a is a normed 
linear space (addition of functions and multiplication of functions by 
numbers being defined in the usual way). 

Problem 9. Prove that the space F®, 6] defined in the preceding comment 
is complete. 

Problem 10. Does there exist a continuous function which is not of 
bounded variation on any interval ? 

Hint. Recall Problem 10, p. 327 and Corollary 1 above. 
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33. Reconstruction of a Function from Its Derivative 

33.1. Statement of the problem. We now address ourselves to the second 
of the problems posed on p. 314, i.e., we look for the largest class of functions 
Fsuch that 

J ' V(t) dt = F(x) - F(a), (1) 

» a 

or equivalently 

F(x) = F(a) + (V(<) dt. (2) 

(As we know from calculus, these formulas hold if F is continuously differ¬ 
entiable.) From the outset, we must restrict ourselves to functions F which 
are differentiable (i.e., have a finite derivative) almost everywhere, since 
otherwise (2) would be meaningless. Every function of bounded variation 
has this property (see Corollary 1, p. 331). Moreover, the right-hand side of 
(2) is a function of bounded variation (see Corollary 2, p. 331). It follows 
that the largest class of functions satisfying (2) must be some subset of the 
class of functions of bounded variation. Since every function of bounded 
variation is the difference between two nondecreasing functions (see Theorem 
4, p. 331), we begin by studying nondecreasing functions from the standpoint 
of formula (1). 

Theorem 1. Let F be a nondecreasing function on [a, b]. Then the 
derivative F' is summable on [a, b ] and 


JV(t) dt < F(b) - F(a). 


Proof. Let 


(I>„(t) = n F^t + - F(t) (n = 1, 2, . . .), 

where, to make <&„(/ meaningful for all t e [a, b ], we get F(t) — F(b) 
for b < t < b + 1, by definition. 8 Clearly 


F'(t) = lim 


t + ;) - m 


= lim (£>„(!) 


almost everywhere on [a, b}. Since Fis summable on [a, b ], by Theorem 


8 Verify that this does not affect the validity of the proof. 
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1, p. 316, so is every <&„. Integrating we get 

|' & <D„(0 A = n f # [W t + - F (Ol dt = n [ f 6+<1 /b> F(0 dt - [ a F(t) dt 

Ja Ja \ fij *'a+(l/w) •'a 

= »r p +<1/B) F(r)*- f a+<1/n) F(0^1 <F(b)-F(a), 

Ja 

where in the last step we use the fact that F is nondecreasing. The 
summability of F' and the inequality (3) now follow at once from Fatou’s 
theorem (Theorem 3, p. 307). 1 

Example 1. It is easy to find nondecreasing functions F for which (3) 
becomes a strict inequality, i.e., such that 


For example, let 

Then 


P’F'CO dt < F(b) - F(a). 

Ja 



if 0 < t < |, 
if J < t < 1. 


o = JV(0 dt < F(l) - F(0) = 1. 


(4) 


Example 2 (The Cantor function). In the preceding example, Fis discontin¬ 
uous. However, it is also possible to find continuous nondecreasing functions 
satisfying the strict inequality (4). To this end, let 

K», b?] = [h |] 


be the middle third of the interval [0, 1], let 


[a[ 2 \ M 2> ] = [*, H [af ) ,b ( i ) ]=[hi] 

be the middle thirds of the intervals remaining after deleting [a' 1) , F* 1 ’] from 
[0, 1], let 

[a< 2 >, b[ 2) ] = hV, A], [a ( 2 2) , b™] = hV, M, 

[a< 2) , h< 2) ] = [if, Ub b< 2) ] = [|f, |f] 


be the middle thirds of the intervals remaining after deleting [a[ u , b™], 
[a< 2 >, 6< 2) ] and [a^b^] from [0, 1], and so on, with 


[a[ n \ b[ n) ]. 


r An) l(»)i \rS n Li 

L a k j bic J, • • •, Luo* 1 , t>z" 1 J 


being the 2" -1 intervals deleted at the nth stage. Note that the complement of 
union of all the intervals , b ( k n) ] • is the set of all “points of the second 
kind” of the Cantor set constructed in Example 4, p. 52, i.e., all points of the 
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and so on, as shown schematically in Figure 19. Then Fis defined everywhere 
on [0, 1] except at points of the second kind of the Cantor set. Given any 
such point t*, let {/„} be an increasing sequence of points of the type (5) 
converging to t*, and let {^} be a decreasing sequence of points of the same 
type converging to t* (why do such sequences exist?). Then let 

F(t*) = lim F(t n ) = lim F(t' n ) 

n~> co n-+ co 

(justify the equality of the limits). Completing the definition of Fin this way, 
we obtain a continuous nondecreasing function on the whole interval [0, 1], 
known as the Cantor function. (Fill in some missing details.) The derivative 
F' obviously vanishes at every interior point of the intervals Wf' 1 , bf ) ], and 



Figure 19 
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hence vanishes almost everywhere, since the sum of the lengths of these 
intervals equals 

3 “ 9 I 2 7 I 1 

(the Cantor set is of measure zero). It follows that 

0 = |V(0 dt < F( 1) - F(0) = 1. 

33.2. Absolutely continuous functions. We have just given examples of 
functions for which formula (1) does not hold. To describe the class of 
functions satisfying (1), or equivalently (2), we will need the following 

Definition. A function f defined on an interval [a, b ] is said to be 
absolutely continuous on [a, b] if, given any s > 0, thereisa 8 > 0 such that 

for every finite system of pairwise disjoint subintervals 
{a k ,b k ) <=■ [a,b] (k=\,...,n) 

of total length 

l(b k ~ a k ) 

k -1 

less than 8. 

Remark 1. Clearly every absolutely continuous function is uniformly 
continuous, as we see by choosing a single subinterval (a x , b t ) <= [a, b]. 
However, a uniformly continuous function need not be absolutely continuous. 
For example, the Cantor function ^constructed in Example 2 of the preceding 
section is continuous (and hence uniformly continuous) on [0, 1], but not 
absolutely continuous on [0, 1], In fact, the Cantor set can be covered by a 
finite system of subintervals ( a k , b k ) of arbitrarily small total length (why ?). 
But obviously 

n 

2\F(b k ) - F(a k )\ ^ I 

k=l 

for every such system. The same example shows that a function of bounded 
variation need not be absolutely continuous. On the other hand, an absolutely 
continuous function is necessarily of bounded variation (see Theorem 2). 

Remark 2. In the definition, we can change “finite” to “finite or count¬ 
able.” In fact, suppose that given any s > 0, there is a 8 > 0 such that 

2l/(M -/(«*) I < e' < s 

k= 1 

for every finite system of pairwise disjoint intervals ( a k , b k ) c [a, b] of total 


length less than 8, and consider any countable system of pairwise disjoint 
intervals (a*., (3j.) c [a, b] of total length less than 8. Then obviously 

Il/CM —/(«,)I < e' 

k =1 

for every n. Hence, taking the limit as /z —* co, we get 

CO 

2!/(M-/K)l < s' < s. 

jt=i 

Theorem 2. Iff is absolutely continuous on [a, b], then f is of bounded 
variation on [a, b ]. 

Proof. Given any e > 0, there is a 8 > 0 such that 

i\f(b k )~f(a k )\<z 

for every system of pairwise disjoint intervals (a k , b k ) c [a, b] such that 

- a k ) < 8. 

fc= 1 

Hence if [a, [3] is any interval of length less than 8, we have 

Vl(f) < «. 

Let v 

a — x 0 < Xj < • • • < x N = b 

be a partition of [a, b] into N subintervals [x k _ t , x k ] all of length less 
than 8. Then, by Theorem 2, p. 329, 

V b a (f) <Ne< co. 1 

Theorem 3. Iff is absolutely continuous on [a, b], then so is erf where 
a is any constant. Moreover, if f andg are absolutely continuous on [a, b ], 
then so is f + g. 

Proof An immediate consequence of the definition of absolute con¬ 
tinuity and obvious properties of the absolute value. 1 

It follows from Theorems 2 and 3 (together with Remark 1) that the set 
of all absolutely continuous functions on [a, b] is a proper subspace of the 
linear space of all functions of bounded variation on {a, b ]. 

Theorem 4. Iff is absolutely continuous on [a, b ], then f can be repre¬ 
sented as the difference between two absolutely continuous nondecreasing 
functions on [a, b ]. 
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Proof. By Theorem 2,/is of bounded variation on [a, b], and hence 
can be represented in the form 

f=v-g, 

where 

v(x) = V*(f), g = v -f 

are the same nondecreasing functions as in Theorem 4, p. 331. We now 
verify that v and g are absolutely continuous. Given any s > 0, let 8 > 0 
be such that 

hf(h)-f{a h )\<z' <z 
1 

for every finite system of pairwise disjoint subintervals (a k , b k ) <= [ a , b ] 
of total length less than 8. Consider the sum 

21 »(b k ) - v(a k )\ =J,[v(b k ) - v(a k )], 

k—1 fc=l 

equal to the least upper bound of the sums * 

n m/c 

2 2 \f( x k,l) ~f( x k,l- l)l (6) 

jfc=U=X 

taken over all possible finite partitions 

a i — ^1,0 < x l.l < • • • < x 1 , mi = b x , 

a k = x k .0 < x k,l < ■ ■ ■ < x k,m k ~ F k , 

x n ,0 ^ x n.l ^ ^ x n,m n = 

of the intervals (a l5 b x ),. .. , (a n , b„). The total length of all the intervals 
( x k,i- 1 , x k.i ) figuring in (6) is clearly less than 8, and hence the sum (6) is 
less than s', by the absolute continuity of/. Therefore 

n 

2]\v(b k ) - v(a k )| < s' < s, 

;s=1 

i.e., v is absolutely continuous on [a, b). It follows from Theorem 3 
that g — v — f is also absolutely continuous on [a, 6], ( 

We now study the close connection between absolute continuity and the 
indefinite Lebesgue integral: 

Theorem 5. The indefinite integral 

F(x) = [7(0 dt 

Jo, 

of a summable function f is absolutely continuous. 


Proof. Given any finite collection of pairwise disjoint intervals 
(a k> b k ), we have 


2\F(b k ) - F(a k )\ =% 

k =1 k=1 


dt < | Nf(t )i dt=J i/(oi dt. 

J a k k=l Ja k 


But the last expression on the right approaches zero as the total length 
of the intervals {a k , b k ) approaches zero, by the absolute continuity of 
the Lebesgue integral (Theorem 6, p. 300). | 


Lemma. Let f be an absolutely continuous nondecreasing function on 
[a, b ] such that f'{x) = 0 almost everywhere. Then f (x) = const. 

Proof. Since/is continuous and nondecreasing, its range is the closed 
interval [f(a),f(b)]. We will show that the length of this interval is zero 
if f'(x) — 0 almost everywhere, thereby proving the lemma. Let E be 
the set of points x e [a, b] such that fix) = 0, and let Z = [a, b] — E, 
where y(Z) = 0, by hypothesis. Given any s > 0, we find 8 > 0 such 
that 

21 f(bk)-f(a k )\< S (7) 

k 


for any finite or countable system of pairwise disjoint intervals ( a k , b k ) c 
[a, b] of length less than 8 (recall Remark 2, p. 336), and then cover Z 
by an open set of measure less than 8 (this is possible, since Z is of measure 
zero). In other words, we cover Z by a finite or countable system of 
intervals (a k , b k ) of total length less than 8. It then follows from (7) that 
the whole system of intervals, and hence (a fortiori) the set 

Z C U (flfc> 

k 

is mapped into a set of measure less than s. But then p.[/(Z)] = 0, 
since e > 0 is arbitrary. 

Next consider the set E — [a, b] — Z, and let x 0 e E. Then, since 
/'(*„) = 0, we have 

f(x) -/(x 0 ) _ 

- < £ 

X — X 0 

for all x > x 0 sufficiently near x 0i i.e., 

/(x)-/(x 0 ) < e(x - x 0 ) 
or 

e* 0 —/(*o) < sx —f(x). 

Therefore the point x 0 is invisible from the right with respect to the 
function zx—f(x). It follows from Lemma 1, p. 319 that E is the 
union of no more than countably many pairwise disjoint intervals (a k , (3*), 
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with end points satisfying the inequalities 

“/(«*) < eP* -/(Pi) 

or 

/(Pi) — /(**) < e(Pi — a*). 

But then 

2 [/(Pi) - /(«*)] < s 2 (Pi - <**) < s(i> - a). 

i k 

In other words,/maps E into a set covered by a system of intervals of 
total length less than s (b — a). Therefore p. [/(£)] = 0, since s > 0 
is arbitrary. 

We have just shown that the sets/(Z) and/ (E) are both of measure 
zero. But the interval [f(a),f(b)) is the union of f(Z) and f(E). It 
follows that [fa), f(b)] is of length zero, i.e., that fix) = const. | 

We are now in a position to prove 

Theorem 6 ( Lebesgue ). If F is absolutely continuous on [a, b], then 
the derivative F' is summable on [a, b] and 

Fix) = F(a) + f* F'(t) dt. (8) 

J a 

Proof. We need only consider the case of nondecreasing F (why?). 
Then F‘ is summable, by Theorem 1, and the function 

®(x) = F(x) - J* F'(t) dt ( 9 ) 

is also nondecreasing. In fact, if x" > x\ then 

®(x") - ®(x') = F{x") - F{x’ ) - (0 dt > 0, 

J x' 

where we again use Theorem 1. Moreover, <5 is absolutely continuous, 
being the difference between two absolutely continuous functions (recall 
Theorems 3 and 5), and O'(x) = 0 almost everywhere, by Theorem 8, 
p. 324. It follows from the lemma that <£>(*) = const. Setting x = a, 
we find that this constant equals Fia). Replacing <b(x) by F(a) in (9), 
we get (8). 1 

Remark. Combining Theorems 5 and 6, we can now give a definitive 
answer to the second of the questions posed on p. 314 (see also p. 333): 
The formula 

JV(0 dt = Fix) - Fid), 

da 

or equivalently. 

Fix) = Fia ) + ( x F’(t) dt, 

holds for all x 6 [a, b] if and only if F is absolutely continuous on [a, b]. 
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33.3. The Lebesgue decomposition. Let / be a function of bounded varia¬ 
tion on [a, b\. Then it follows from Theorem 4, p. 331 and Problem 9, p. 327 
that/can (ingeneral) be represented as a sum 

fix) = 9 ix) + <K*), (10) 

where 9 is a continuous function of bounded variation and j; is a jump 
function . 9 Now let 

<Pi(*) = /V(0 d F / n ) 

9 2 (x) = 9 (x) — 9 /x). 

Then 9 X is absolutely continuous, while 92 is a continuous function of bounded 
variation such that 

9'fx) = f(x) - j- x J a *<P'(0 dt = 0 

almost everywhere. A continuous function of bounded variation is said to 
be singular if its derivative vanishes almost everywhere. For example, the 
Cantor function F constructed in Example 2, p. 334 is singular. Combining 
( 10 ) and ( 11 ), we find that a function/of bounded variation can (in general) 
be represented as a sum 

fix) = 9 i(*) + ?a(*) + <K*) ( 12 ) 

of an absolutely continuous function 9 ^ a singular function 92 and a jump 
function / Formula (12) is known as the Lebesgue decomposition. 

Remark. Differentiating (12), we get 

fix) = 9i(x) 

almost everywhere. Thus integration of the derivative of a function of 
bounded variation does not restore the function itself, but only its absolutely 
continuous “component,” while the other two components, i.e., the singular 
function and the jump function, “disappear without a trace.” 

Problem 1. Prove that a function / is absolutely continuous on [a, b] if 
and only if it is a continuous function of bounded variation mapping every 
subset Z c: [a, b] of measure zero into a set of measure zero. 

9 Generalizing Problem 9, p. 327, by a jump function, we now mean a function of 
the form 

2 A. + 2 An, 

where the numbers h x , . . . , h„, . . . and h\, . . . , h’„, . . . corresponding to the discon¬ 
tinuity points x„ . .., x n ,. . . and x[,..., xh,... satisfy the conditions 

2 |A»| < oo, J |Ai| < a> 

n n 

(we now allow negative h„, h’ n ). 
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Problem 2. Verify directly from the definition on p. 336 that the function 

lx sin - if x # 0, 
f(x) = * 

M3 if x — 0 

fails to be absolutely continuous on any interval [a, b] containing the point 
x — 0. 

Problem 3. Prove that if a function / satisfies a Lipschitz condition 
|/(x') -f(x")\ < K\X' -x"\ 

for all x' , x" e [a, b], then/is absolutely continuous on [a, b]. 

Problem 4. Prove that each of the terms <p l5 <p 2 and in the Lebesgue 
decomposition (12) is unique to within an additive constant. 

Comment. The stipulation “to within an additive constant” can be 
dropped if we require the function/and its “components” to vanish at x — a, 
say, or if we agree to regard all functions differing by a constant as equivalent. 

Problem 5. Let 4/ fj] be the space of all absolutely continuous functions 
/ defined on [a, b ], satisfying the condition f(a) = 0. Prove that is 

a closed subspace of the space of all functions of bounded variation 

on [a, b] satisfying the same condition, equipped with the norm||/|| = V*(f). 

Comment. There is no need for the condition f(a ) = 0 if we regard all 
functions differing by a constant as equivalent. We then have ||/|| = 0 if 
and only if / = const. 

Problem 6. Starting from a locally summable function/, i.e., a function 
summable on every finite interval, defined the corresponding generalized 
function / and generalized derivative/' by the formulas 

(/> 9) = f°° /(*)?(*) dx, 

J — CO 


(/'. 9) = dx 

as in Sec. 21.2. (Here cp is any test function, i.e., any infinitely differentiable 
function of finite support.) Prove that the generalized derivative/' determines 
/ to within an additive constant. Apply this to the case of the function 

(0 if x < 0, 

f(x) = I F(x) if 0 < x < 1, 

(1 if x > 1, 

where F is the Cantor function constructed in Example 2, p. 334. 


Hint. See Theorem 1, p. 213. 

Problem 7. Let / and /' be the same as in the preceding problem, and 
suppose / is of bounded variation on (— oo, oo). Then / has an ordinary 
derivative almost everywhere. Let f x be the generalized function corre¬ 
sponding to df/dx, so that 

(/i. 9) = /-a, -f 9(X> dx. 
dx 

Prove that 


a) In general,/i does not equal the generalized derivative/'; 

b) If /is absolutely continuous, then f x =/'; 

c) If f x — /', then/is equivalent to an absolutely continuous function 10 
and, in particular, is absolutely continuous if it is continuous. 

Hint. In a), consider the function 


f(x) = 


if x < 0, 
if x > 0. 


Comment. Problems 6 and 7 further illustrate the situation discussed 
on pp. 206-207. To carry out the operations of analysis (in this case, recon¬ 
struction of a function from its derivative), we can either restrict the class of 
admissible functions (by requiring them to be absolutely continuous) or else 
extend the notion of function itself (at the same time, extending the notion 
of a derivative). 


34. The Lebesgue Integral as a Set Function 

34.1. Charges. The Hahn and Jordan decompositions. As we now show, 
the theory developed in Secs. 31-33 for functions defined on the real line 
(—oo, oo) continues to make sense in a much more general setting. Let X 
be a space (i.e., some “master set”) equipped with a measure p, and let/ 
be a p-summable function defined on X. Then / is summable on every 
measurable subset E <= X, so that the integral 

<1>(E) = £ f(x) dy. (1) 

(for fixed /) defines a set function on the system of all p-measurable 
subsets of X. By Theorem 4, p. 298, fD is u-additive, i.e., if a measurable 
set E is a finite or countable union 

E = U E n 

_ _ n 

10 I.e., coincides almost everywhere with an absolutely continuous function. 








344 DIFFERENTIATION 


THE LEBESGUE INTEGRAL AS A SET FUNCTION 345 


CHAP. 9 SEC. 34 


of pairwise disjoint measurable sets E n , then 

d>(£) = 

n 

In other words, the set function (1) has all the properties of a cj-additive 
measure except that it may not be nonnegative in the case where / takes 
negative values. These considerations suggest 

Definition 1. A a-additive set function defined on a a-ring (in 
particular , a a-algebra) of subsets of a space X and in general taking 
values of both signs is called a signed measure or charge (on X). 

Remark. Thus the notion of a measure is equivalent to that of a non¬ 
negative charge. 

In the case of electrical charge distributed on a surface, we can divide 
the surface into two regions, one carrying positive charge (i.e., such that 
every part of the region is positively charged) and one carrying negative 
charge. We will establish the mathematical equivalent of this fact in a 
moment, after first introducing 

Definition 2. Let <t> be a charge defined on a a-algebra Ef of subsets 
of a space X. Then a set A <= X is said to be negative with respect to O 
if E O A e SR and ®(£ n A) < 0 for every E e £P. Similarly, A is said 
to be positive with respect to <1> if E n A e SB and <\>(E n A) > 0 for 
every E e SR. 

Theorem 1 . Given a charge on a space X, there is a measurable set 
A~ <= X such that A~ is negative and A+ — X — A~ is positive with 
respect to <3>. 

Proof. Let 

a = inf d>(,4), 

where the greatest lower bound is taken over all measurable negative 
sets A. Let {A n } be a sequence of measurable negative sets such that 

lim $(/!„) = A. 

n~* oo 

Then 

A-= U A n 

n 

is a measurable negative set such that 

<t>(A~) = a 

(why?). To show that A~ is the required set, we must now prove that 
A+ = X - A- is positive. Suppose A+ is not positive. Then A+ contains 
a measurable subset B 0 such that <&(£„) < 0. However, B„ cannot be 


negative, since if it were, the set A = Ar u B 0 would be a negative set 
such that <&(A) < a , which is impossible. Hence there is a least positive 
integer k x such that B 0 contains a subset B x satisfying the condition 

®CBi) > ~ • 
ki 

Obviously B x # B 0 . Applying the same argument to the set B 0 — B x , 
we find a least positive integer k 2 such that B 0 — B x contains a subset Zi 2 
satisfying the inequality 

^(B^) > t~ (k 2 > k x ) 
k 2 

(explain why k 2 > k x ), a least positive integer k 3 such that (B 0 — B x ) — B 2 
contains a subset B 3 satisfying the inequality 

dtC-Bj) > — (k 3 > k 2 ), 

k 3 

and so on. Now let 

T = B 0 -UB n . 

n—1 

Clearly F is nonempty, since f L(A? 0 ) < 0 while ( b(B n ) > 0 for all n > 1. 
Moreover, F is negative by construction (think things through). Hence 
the set A = A~ u f is again negative and d>(4) < a, which is impossible. 
This contradiction shows that A + — X — A~ must be positive. 1 

Thus we can represent X as a union 

X — A + U A~ (2) 

of two disjoint measurable sets A+ and A~, where A+ is positive and Ar is 
negative with respect to the charge <t>. The representation (2) is called the 
Hahn decomposition of X, and may not be unique. However, if 

X = A+ U A x , X = At U A+ 
are two distinct Hahn decompositions of X, then 

<&(£ n AL) = 0(£ n At), <t>(E n Af) = <E»(£ n At) (3) 
for every E e £P. In fact, 

£ n (At - A 2 ) <= £ n At (4) 

and at the same time 

£ n (At - Af) <= E DAf. (5) 

But (4) implies 


<b(£ n A(t - At)) < 0, 
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while (5) implies 

0(£ n (AT - A^)) > 0. 

Therefore 

<!>(£ n (AT - AT)) = 0, (6) 

and similarly 

<*>(E n - AD) = 0. (7) 

It follows from (6) and (7) that 

0(E n AT) = 0(£ n (At - At)) + 0 (E n (At n At)) 

= €>(£ n (AJ - AD) + <t(E n (At n At)) = 0(£ n at), 

which proves the first of the formulas (3). The second formula is proved 
in exactly the same way. 

Thus a charge 0 on a space X uniquely determines two nonnegative set 
functions, namely 

0+(£) = 0(£ n A+), ®-(E) = -4>(E n A-), 

called the positive variation and negative variation of <J>, respectively. It is 
clear that 

1 ) 0 = 0 + - 0 ~; 

2) 0+ and 0~ are nonnegative a-additive set functions, i.e., measures; 

3) The set function |0| = 0+ + 0 - , called the total variation of 0, is 
also a measure. 

The representation 

0 = 0+ _ 0- 

a charge 0 as the difference between its positive and negative variations 
is called the Jordan decomposition of 0. 

34.2. Classification of charges. The Radon-Nikodym theorem. We now 
classify charges on a space X equipped with a measure: 

Definition 3. Let \x be a a-additive measure on a a-algebra 6^ of 
(p- measurable) subsets of a space X, and let 0 be a charge defined on 
Then 0 is said to be concentrated on a set A e .'A if 0(E) = 0 for every 
measurable set E <= X — A. 

Definition 4. Let p, Sfi, X and 0 be the same as in Definition 3. 
Then 0 is said to be 

1) Continuous if 0(E) = 0 for every single-element set E ^ X of 
measure zero ; 
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2) Singular if 0 is concentrated on a set of measure zero ; 

3) Discrete if 0 is concentrated on a finite or countable set of measure 
zero', 

4) Absolutely continuous (with respect to p) if 0(E) = 0 for every 
measurable set E such that p (E) = 0. 

Clearly, the Lebesgue integral 

®(£) = L<pO) 

of a fixed summable function tp is absolutely continuous with respect to the 
measure p. As we will see in a moment, every absolutely continuous charge 
can be represented in this form. But first we need the following 

Lemma. Let p be a a-additive measure defined on a a-algebra of 
subsets of a space X, and let 0 be another such measure defined on 
Suppose 0 is absolutely continuous with respect to p and is not identically 
zero. Then there is a positive integer n and a set A e Sf such that p(A) > 0 
and A is positive with respect to the charge 0 — (l/n)p. 

Proof Let 

X = ATUAt (n = 1,2 ,...) 

be the Hahn decomposition corresponding to the charge 0 — (l/«)p, 
and let 

AT = fl AT, At = U A+ 

n =1 ft —1 

Then 

0(A„) < - p(Ap) 
n 

for all « = 1,2,..., i.e., 0(A“) = 0, and hence 0(A+) > 0 since 
X == At 'A At and 0 is not identically zero. But then p(A+) > O', by 
the absolute continuity of 0. Hence there is an n such that p(A+) > 0 
(why?). This n and the set A = A+ satisfy the conditions of the lemma. 

Theorem 2 (Radon-Nikodym). Let p be a a-additive measure defined 
on a a-algebra -iC of subsets of a space X, and let 0 be a charge defined on 
Tfi. Suppose 0 is absolutely continuous with respect to p. Then there is a 
\L-summable function 9 on X such that 

$(£) = j E <p(x) dii ( 8 ) 

for every E e The function <p is unique to within its values on a set 

of \i-measure zero. 
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Proof. We can assume that ® is not identically zero, since otherwise 
we need only choose 9 to be any function equal to zero almost everywhere 
(discuss the uniqueness of 9 in this case). Let K be the set of all y- 
summable functions on X such that 


L/w d v- < ®( £ ) 

"hi 

for every E e and let 

M = sup \ f(x)dy. 
feK Jx 

Moreover, let {/„} be a sequence of functions in K such that 


and let 
Then clearly 
Moreover, 


lim f f n (x) dy = M, 

n-* co J 


g n (x) = max {/(*), . . . ,/„(*)}• 
gi(x) < gi(x) < • • ■ < g n (x) < • • •. 


(9) 


dy < 0(E) (10) 

for every Eg In fact, E can be written in the form 

£ = U £*., 

7c=l 

where the sets £j. E n are pairwise disjoint and gfx) — f k (x) on 

E k , and hence 

Ls»(*) d \ x = i /„/*(*) dy < f,0(E k ) = 0(E). 

JE k= 1 k=l 

In particular, it follows from (10) that g n e K, so that 
J x 8n(x) dy < M. 

But then 


since otherwise 


lim \ g n (x)dy = M, 

n~* oo *'- x 


lim L f n (x) dy < lim Jg„(x) dy < M, 

n-> co ** A. n-> co *'- x - 

contrary to (10). Writing 

90) = sup g m (x), 

n 


and hence, by Levi’s theorem (Theorem 2, p. 305), 

[ 9 (x) dy = lim j g n (x) dy. = M. (11) 

n-»oo 

Next we show that 9 is the required function, figuring in the repre¬ 
sentation ( 8 ). By construction, the set function 

X(E) = 0(E) - j K o(x) dy 

is nonnegative and in fact is a cr-additive measure. If X(£) jk 0, then, 
by the lemma, there is an s > 0 and a set A e such that y(A) > 0 
and 

sy(E n A) < A(£ n A) 

for every E e Let 

h(x) = 9 (x) + sxa(x), 

where 11 

(1 if x: e A, 

Xa(x) = 

(0 if x A. 

Then 

i(x) dy. — J^ 9 (x) dy. + s y(E O A) 

< j E _ A 9(x) dy + 0(E n A) < $(£), 

so that h belongs to the set K introduced at the beginning of the proof. 
On the other hand, it follows from (11) that 

J x h(x) dy = J^cp(x) dy + £p(T) > M, 

contrary to the definition of M. Therefore X(£) = 0, which is equivalent 
to ( 8 ). 

Finally, to prove that 9 is unique to within its values on a set of 
measure zero, suppose 

$(£) = pO) dy = j K <P*(x) dy 

for all E e Then, by Chebyshev’s inequality (Theorem 5, p. 299), 
we have 

y(A m ) < mf [(p(x) - 9 *(x)] dy = 0 


<f> 0 ) = lim g„(x), 

n-* 00 


we find that 


Xa is called the characteristic function of the set A. 
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for every set 

A m = jx: cp(x) - (p*(x) > 
and similarly 

« = 0 

for every set 

B n = |x:?*0c) — r f>(x) > 1} 


(m = 1,2, . ..), 


(» = 1 , 2 ,...). 


and hence 


{x: cp(x) ^ cp*(x)} = A m j U B n ^, 

p{* : <p.(x) ^ 9 *(x)} = 0, 


he., <p(x) = 9 *(x) almost everywhere, g 

Remark 1. The function <p figuring in the representation (8) is called the 
Radon-Nikodym derivative (or simply the density) of the charge <I> with 
respect to the measure p, and is denoted 


Clearly, Theorem 2 is the natural generalization of Lebesgue’s theorem 
(Theorem 6, p. 340), which states that an absolutely continuous function 
F is the integral of its own derivative F'. However, in the case of a function 
F defined on the real line there is an explicit procedure for finding the 
derivative of Fat a point x„, namely evaluation of the limit 


lim — = lim F(x ° + Ax) - ; 

Aa-»0 Ax An—*0 Ax 


whereas the Radon-Nikodym theorem only establishes the existence of the 
derivative d<t>/dyL, without telling how to find it. However, an explicit 
procedure can be given for evaluating dA>/d\x at a point x 0 e X by calculating 
the limit 

itoSta, 

where {AJ is a system of sets “converging to the point x 0 ” as e-> 0, in a 
suitably defined sense. 12 


12 For the details, see G. E., Shilov and B. L. Gurevich, Integral , Measure and Deriv¬ 
ative: A Unified Approach (translated by R. A. Silverman), Prentice-Hall, Inc., Englewood 
Cliffs, N.J. (1966), Chap. 10. 
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Remark 2. It can also be shown 13 that an arbitrary charge <I> has a unique 
representation as the sum 

3>(F) = A(E) + S(E) + D(E) 

of an absolutely continuous charge A, a singular charge S and a discrete 
charge D. This is the exact analogue of the Lebesgue decomposition on 
p. 341. 

Problem 1. Given any charge defined on a cj-algebra SP, prove that 
there is a constant M > 0 such that !$(£)! < M for all Ee £F. 

Problem 2. Give an example of two distinct Hahn decompositions of a 
space X. 

Problem 3. Prove that a charge <I> vanishes identically if it is both 
absolutely continuous and singular with respect to a measure p. 

Problem 4. Prove that if a charge <I> is concentrated on a set A 0 , then so 
are its positive, negative and total variations. 

Problem 5. Prove that 

a) Every absolutely continuous charge is continuous; 

b) Every discrete charge is singular. 

Problem 6. Prove that if a charge is absolutely continuous (with 
respect to a measure p.), then so are its positive, negative and total variations. 

Problem 7. Prove that if a charge d> is discrete, then there are no more 
than countably many points x u x 2 , . . . , x n , . . . and corresponding real 
numbers h lt h 2 ,. .. , h n ,.. . such that p({x„}) = 0 and 

<h(£) = 2 h n . 

x n eE 

Write expressions for the positive, negative and total variations of d>. 

Problem 8. Let X be the square 0 < x < 1, 0 < y < 1 equipped with 
ordinary two-dimensional Lebesgue measure p, and let ^(F) be the ordinary 
one-dimensional Lebesgue measure of the intersection of E with the interval 
0 < x < 1. Prove that d> is continuous and singular, but not absolutely 
continuous. 

13 G. E. Shilov and B. L. Gurevich, op. cit., Chap. 9. 
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35. Product Measures. Fubini’s Theorem 

The problem of reducing double (or multiple) integrals to iterated integrals 
plays an important role in classical analysis. In the Lebesgue theory, the key 
result along these lines is Fubini’s theorem, proved in Sec. 35.3. En route 
to Fubini’s theorem we will need the preliminary topics treated in Secs. 35.1 
and 35.2, which are also of interest in their own right. 

35.1. Direct products of sets and measures. By the direct (or Cartesian) 
product of two sets X and Y, denoted by I X Y, we mean the set of all 
ordered pairs (x, y) where xeX, y e Y. Similarly, by the direct product of 
n sets X u X 2 ,. . . , X n , denoted by 

X t x X 2 x • • • x X n , (1) 

we mean the set of all ordered n-tuples (x u x 2 , . . . , x n ), where x x € X lt 
x 2 e X 2 , . . . , x n 6 X n . In particular, if 

X 1 = X t = -‘- = X n = X, 

we write (1) simply as X n , the “nth power of X 

Example 1. Real n-space R n is the nth power of the real line R 1 , as 
anticipated by the notation. 

Example 2. The unit cube 7” in n-space, i.e., the set of all elements of R n 
with coordinates satisfying the inequalities 

0 < x k < 1 (k — 1, 2, . .. , n), 
is the nth power of the closed unit interval 7 1 = (0, 1], 


Now let Sf, ... , £P n be systems of subsets of the sets X x , X 2 ,. .. , 
X„, respectively. Then by 

S = ^ X X • • • X 


we mean the system of subsets of the direct product (1) which can be 
represented in the form 

A ==r A 1 x A 2 x * * * x A n , 


where 


If 


A k eSA k (fc = l,2, 

^ = s\ = • • • = ^ = sr. 


then S is the “nth power of written 

S = £A n . 


For example, the system of all closed rectangular parallelepipeds in R n is the 
nth power of the system of all closed intervals in R 1 . 

Theorem 1 . If S^, -5^ are semirings, then so is the set 

S = ^x^x---x^. 

Proof. By the definition of a semiring (see p. 32), we must show that 1 

a) If A, B e S, then A n Be <5; 

b) If A, B e S and B c A, then A can be represented as a finite 
union 

A . = U C (k> 

k=l 

of pairwise disjoint sets C {k) s S, with B — C (1) . 

It is clearly enough to prove these assertions for the case n = 2. Thus 
suppose A e x SP % , B c x Sf 2 . Then 

A = A 1 X A 2 (A t e Pf, A 2 g ^) 

( 2 ) 

B = B t X B 2 {B k eSf,B 2 e SQ, 

and hence 


A n B — (A 1 x At) n (B, x b 2 ) = Uj n A 2 ) x (A 2 n b 2 ). 

But A x n B t e A 2 73 B 2 since Sf and SP 2 are semirings. It 
follows that A C\ B e x <5^. This proves a). 

To prove b), suppose that 

B 1 <= A u B 2 <= A 2 , 


1 Note that the empty set 0 belongs to <5, since 0 = 0 x 0 x ■■■ x 0 (why?). 
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in addition to (2). Then, since SPy and SP 2 are semirings, there are finite 
expansions 

A i =B 1 Kj B[ v U • • • U B\ l \ 

A 2 = b 2 u b' 1 ' u • • • U B‘ 3) , 

where the sets B 1; Bj 11 ,. . . , B[ i} are pairwise disjoint and belong to SPy, 
while the sets B 2 , B* 1 ’,.. . , B* j) are pairwise disjoint and belong to SP 2 . 
Therefore 

A = AyXA 2 — (ByX B 2 ) U (By X Ba 11 ) U • • • U (B x X B' 1 ’) 

U (Bj 1 ’ X B 2 ) U (Bj 1 * X Bg 1 ’) U • • • U (Bj 11 x B-i 3> ) 

U • • • U (Bj° X B 2 ) U (B‘ ,:) x B[ u ) U ■ • • U (B' 0 X B 2 j) ) 

is the desired finite expansion of Ay X A 2 , where By x B 2 is the first term 
and the other terms are pairwise disjoint and belong to S = 

SP X X K 1 

Now let SP X , SP 2 ,.. . , SP n be n semirings, equipped with measures 

M^). I*„ (A n ) (A k 6 SQ, (3) 

and let p be the measure on the semiring <5 = SPy x SP 2 x ■ • ■ x SP n 
defined by the formula 

IJ-(A) = y.y(Ay)ii 2 (A) ■ ■ ■ (i n (A n ) 

for every A = Ay X A 2 X • • • X A n . Then p is called the direct for Cartesian) 
product 2 of the measures (3), and is denoted by 

p = Pi X p 2 x • • • X p„. 

To confirm that p is indeed a measure, we now show that p is additive (p is 
obviously real and nonnegative). It will again be enough to consider the 
case n — 2. Suppose 

t 

A = Ay X A 2 = U B {k \ (4) 

k=l 

where 

B u) n B m =0 (i # j) 

and 

B (fc> = Bj*’ X Bi k) . 

According to Lemma 2, p. 33, there are finite expansions 

Ay= U C[ m) , A 2 = U Ci n) , 

m—1 w=l 

2 The term product measure will be used with a different meaning below. 
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each involving pairwise disjoint sets, such that each Bf ] is a finite union 

B\ k) = U C\ m) 

meM k 

of certain of the sets C\ m) , while each B {k) is a finite union 

Bg W = U C[ n) 

neNk 

of certain of the sets Cf ] (here M k denotes some subset of the set {1, 2,... , r) 
and N k some subset of the set {1, 2, ... ,.?}). But then, by the additivity of 
Pj and p 2 we have 

q(^4) = V-iiAy^iAz) = 2 Pi(C‘ ro) ) J p 2 (C<">) 

m—\ 7i=l 

= 2 2 2 p 

1 twsMa: tig A 7 * 

= i^«)!A(Bi fc >)=iq(5,), 

S=1 fc=l 

which, when compared with (4), shows the additivity of p = p x x p 2 . 

Example 3. Thus the additivity of area of rectangles in the plane follows 
from the additivity of length of intervals on the line. 

Theorem 2. If the measures p 1( p 2 .p„ are a-additive, then so is 

the measure p = p x X p 2 X • • • X p„. 

Proof. Again we need only consider the case n = 2. Let X x denote the 
Lebesgue extension of the measure p 1( and suppose 

C = u c„, 

71=] 

where the sets C n are pairwise disjoint and the sets C, C n belong to 
SP X x SP 2 , i.e., c = AxB (A e SPy, B e SP 2 ), 

C„ = A n x B n (A n e SPy, B n e SP 2 ). 

Moreover, let 

/„(*) = 

We then have 

00 

2fn(x) = (ACS) if xeA, 
n —1 

and hence, by the corollary on p. 307, 

2 f /„(*) dXy = f p 2 (B) dXy = X 1 (d)p 2 (B) 

n= 1 JA Ja 

= Px(v4)p 2 (B) = p(C). (5) 


f p 2 (B n ) if xeA ni 
0 if x $ A n . 



356 MORE ON INTEGRATION 


PRODUCT MEASURES. FUBINl’S THEOREM 357 


CHAP. 10 

But 

£/»(*) dh = Pi( A n)M B n) = p(CJ. (6) 

Substituting (6) into (5), we get 

CO 

u(C) = 2 (MCJ. i 

n —1 

Again let SP X , SP Z ,... , LP n be n semirings, this time equipped with 
< 7-additive measures (3). Then it follows from Theorem 2 that the measure 3 

m = Hi X ^ X ■ • ■ X |i, (7) 

is cr-additive on the semiring 

S = ^X^X---X^. 

Therefore, as in Sec. 27, m has a Lebesgue extension p defined on a cr-ring 
^ ( 3 , This measure p is called the product measure of the measures (3), 
and is denoted by 

(x = ^ ® p 2 ® • • • ® p„. (8) 

The distinction between the meaning of the symbols X and ® in (7) and 
(8) is crucial. 

Example 4. Let 

(Xl = Pa = ’ ' ' = = [X 1 , 

where p 1 is ordinary Lebesgue measure on the line. Then the product 
measure (8) is ordinary Lebesgue measure in n-space. 

35.2. Evaluation of a product measure. Let G be a region in the xy-plane 
bounded by the vertical lines x = a, x = b (a < b) and the curves y — f(x), 
y = g(x), where / (x) < g(x). Then it will be recalled from calculus that the 
area of G is given by the integral 

-fix)] dx, 

da 

where the difference g(x 0 ) — f(x 0 ) is just the length of the segment in which 
the vertical line x — x 0 intersects the region G. As we now show, the natural 
generalization of this method can be used to evaluate an arbitrary product 
measure: 

Theorem 3. Let p be the product measure 
[x = P* ® ft/> 

3 We change to the symbol m here, to “free” p for use in formula (8). 
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of two measures p x and p„ such that 

1) pj. is a-additive on a Bor el algebra SP of subsets of a set X; 

2) p y is a-additive on a Bor el algebra of subsets of a set Y; 

3) Pa, and p B are complete, in the sense that B <=■ A and p x (A) = 0 
implies that B is measurable (with measure zero), and similarly for 

fV 4 

Then 

p(/l) [yf A x ) dy. x — j^ [± x (A y ) diiy (9) 

for every p -measurable set A, where 5 

A x = {y: (x, y) e A} (x fixed), 

A y = {x: {x, y) e A) (y fixed). 

Proof. We note in passing that the integral over X in (9) reduces to 
an integral over the set of the form 

Ui.cl 

V 

outside which )iy(A x ) vanishes (and similarly for the integral over Y). 

It will be enough to prove that 

lx(A) =j x <? A (x) d\x x , (10) 

where 

9a(x) = Pv( A *)> 

since the other part of (9) is proved in exactly the same way. Observe 
that implicit in the theorem is the conclusion that the set A x is p^-measur- 
able for almost all x (in the sense of the measure p x ) and that the function 
y A (x) is p^-measurable, since otherwise (10) would be meaningless. 

The measure p is the Lebesgue extension of the measure 
m = \l x X p„ 

defined on the semiring £P m of all sets of the form 
A — Ay o X A X0 (A e SP), 

where is the Borel algebra of p-measurable subsets of X X, Y. But 
(10) obviously holds for all such sets, since for them 

. . (l t- t (A x ) if xeA u , 

?a(x) = ( 

_ 1.0 if xfA Vo . 

1 The Lebesgue extension of any measure is complete (see Problem 7, p. 280). 

5 If X is the x-axis and Y the y-axis (so that X x Y is the xy- plane), then A„ 0 is the 
projection onto the y-axis of the set in which the vertical line x = x 0 intersects the set A 
(and similarly for A Vg ). 
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Moreover, (10) carries over at once to the ring generated by 

ff m , since is just the system of all sets which can be represented 

as finite unions of pairwise disjoint sets of (recall Theorem 3, p. 34). 

To prove (10) for an arbitrary set A e we recall from Theorem 8, 
p. 277 that there are sets 

B nk e (B nl <= B n2 <=■■■<= B nk a ■■■) 

and corresponding sets 

B n = U B nlc e (B x => B g =•••=> B„ =>• • ■) 

k 

such that 

A e B = n B n , 

n 

!4 A) = |i(B). (11) 

Clearly, 

9 b£x) = lim <p B b1 (x) < <p Bna (x) < ' ’ ’ < 9bJ.x) < ’ ’ ’> 

k~* oo 

cp B (x) = lim <Pj B n (x), > 9J3 a (*) > ' ’ ' > 9bJ,x) > ‘ • 

w-»oo 

Hence we can invoke Levi’s theorem 6 to extend (10) from the ring 
to the system of all sets B e of the form 

nU5„ s (B nk eSQ- (12) 

« k 

Moreover if y.(A) = 0, then (x(B) = 0, because of (11), and hence 
cp s (x) = (jl/BJ = 0 

almost everywhere. Therefore A x is measurable and 
9a(x) = V-v( A z) = 0 

for almost all x, since A x c B x . But then 

f <?a( x ) d V-x — 0 = [t(.4). 

j X 

In other words, (10) holds for all sets of measure zero, as well as for all 
sets of the form (12). But, according to (11), an arbitrary set A e 
can be represented as 

A = B — Z, 

where B is of the form (12) and Z is of measure zero. Therefore 
B = A KJZ (A r>Z= 0 ). 

6 See Theorem 2, p, 305 and Problem 2, p. 311. 
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It follows that 

\x(A) = [i(B) = | <p B (x) d^ x 

J -X. 

= J x ?a(x) d[L x + j^z(x) d\i x = j x <?Jx) d\L x , 
i.e., (10) holds for every A e S^. ( 

Example 1. Let M be any ^-measurable set, and let / be an integruble 
nonnegative function. Moreover, let Tbe the y-axis, and let fx„ be ordinary 
Lebesgue measure on the line. Consider the set 

A = {{x,y)\x e M, 0 < y </(*)}. (13) 


9 a ( x ) = h,(z<J 


and hence, by Theorem 3, 


fix) if xeA, 
0 if x $ A, 


\>-i A ) = \'? a ( x ) d[i x = [ fix) d[i x - 


This allows us to interpret the Lebesgue integral of a nonnegative function 
over a set M <= X in terms of the (x-measure of the set (13), where jx = 

pa: ® (Xy* 

Example 2. In the preceding example, let X be the x-axis and let M be a 
closed interval [a, b}. Moreover, suppose / is nonnegative and Riemann- 
integrable on [a, b]. Then (14) reduces to the familiar formula 

1x04) = JYW dx 

da 

for the area under the graph of the function y = fix) between x — a and 
x — b. 

35.3. Fubini’s theorem. The next theorem is basic in the theory of 
multiple integration: 

Theorem 4 ( Fubini ). Let |x x and fx„ be the same as in Theorem 3, let p. 
be the product measure ® fx„, and let fix, y) be y.-integrable on the set 
A c X X Y. Then 

j A fix, y) d\L = ^ fix, y) d[Ly j d\x x = ^ fix, y) djx^j d\x y . (15) 

Proof. Note that implicit in the theorem is the conclusion that the 
“inner integrals” in parentheses exist for almost all values of the variable 
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over which they are integrated (x in the first case, y in the second). We 
begin by assuming temporarily that /( x , y) > 0. Consider the triple 
Cartesian product 

U = X X Y X Z, 

where Z is the real line, equipped with the product measure 

= p® ® h, 0 P 1 = H- 0 p l — p® ® (p„ ® p 1 ) 

(see Problem 3), where p 1 is ordinary Lebesgue measure on the line. 
Moreover, consider the set W c: U defined by 

W = {(x,y, z):x e A y , y e A x , 0 < z < /(x,j>)}. 

By (14), 

p„(W0 = j A fi x >y) d l x - (16) 

On the other hand, by Theorem 3, 

P„(W0 = / x Wd!V (17) 

where 

X = p„ ® p 1 , 

B'® = (O', z) : (x, jp, z) e W} (x fixed). 

Using (14) again, we obtain 

X( W x ) = j f(x, y) dp v . (18) 

•'-»x 

Comparing (16)—(18), we get part of (15). The rest of (15) is proved in 
exactly the same way. To remove the restriction that f(x,y) be non¬ 
negative, we merely note that 

fix, y) =f + (x, y ) y), 

where the functions 

,+, x I f(x, y)l +f(x, y) 
f (x, y) =---. 

N \f(x,y)\-f(x,y) 
f (x, y) =--- 

are both nonnegative. 1 

Remark. Thus Fubini’s theorem asserts that if the “double integral” 

1 = L^ X ’ ^ d[X ( 19 > 

exists, then so do the “iterated integrals” 

hv = j x (7 ^ fix, y) dp®, I ux = ^ j ^ f(x, y) dp®j dp,, (20) 

and moreover I = I xy — I yx . 


f (x, y) 
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Problem 1. Give an example of a set in R 2 which is not a direct product 
of any two sets in R 1 . 

Problem 2. Prove that the direct product of two rings (or u-rings) need 
not be a ring (or cr-ring). 

Problem 3. Given three spaces X, Y and Z, equipped with measures 
P®, P„ and p 2 , respectively, prove that (p®, ® p„) ® p 2 and p® ® (p„ ® p 2 ) 
are identical measures on X X Y X Z. 

Problem 4. Let A = [—1, 1] x [—1, 1] and 


Prove that 


fix, y) 


xy 

V + /r' 


a) The iterated integrals (20) exist and are equal; 

b) The double integral (19) fails to exist. 


Hint. Since 


/P fix, y) dx = £ f(x, y) dy = 0, 

we have 

/P (Xp fix, y) dxj dy = fix, y) dy} dx = 0. 

On the other hand, the double integral fails to exist, since 


IP Lpi/(*. y)\ dx d y > 


2" Isin 0 cos 01 



after transforming to polar coordinates. 


Problem 5. Let A — [0, 1] x [0, 1] and 


fix, y) 


->2 n t 1 1 1 1 

2 if — < x <-- , — < v <- 

o n 'yn —1 j 2 n ~ 1 

—2 2 " +1 if < x < — , — < y < -X 

2»+l 2 n r ) n J ^n—X ■ 

0 otherwise. 


Prove that the iterated integrals (20) exist but are unequal. 


Ans. ^ j'fix, y) dx ^ dy — 0, \*fi x , y) dy'j dx = 1. 

Problem 6. The preceding two problems show that the existence of the 
iterated integrals (20) does not imply either the existence of the double 
integral (19) or the validity of formula (15). However, show that the 
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existence of either of the integrals 

\fi x ’ y)l d\i x , ly^ii d\i v (21) 

implies both the existence of (19) and the validity of (15). 

Hint. Suppose the first of the integrals (21) exists and equals M. The 
function 

fjx, y ) = min {| f(x, y)\, n} 

is measurable and bounded, and hence summable on A. By Fubini’s theorem, 
j A fn(x, y) dii = jJS A Ux, y ) dy. x < M. 

Moreover, {f n (x,y)} is a nondecreasing sequence of functions converging 
to | f(x,y)\. Use Levi’s theorem to deduce the summability of \f(x,y)\ 
and hence that of f(x,y) on A. 

Problem 7. Show that Fubini’s theorem continues to hold for the case of 
a-finite measures (cf. Sec. 30.2). 


36. The Stieltjes Integral 

36.1. Stieltjes measures. Let F be a nondecreasing function defined on a 
closed interval [a, b], and suppose F is continuous from the left at every 
point of {a, b]. Let SP be the semiring of all subintervals (open, closed or 
half-open) of [a, b), and let m be the measure on SP defined by the formulas 7 

m( a, p) = F(P) _ /■(« + o), 
m[a, p] = F(P + 0) - F( a), 
m( a, [3] = F(P + 0) - F (a + 0), 
m[a, P) = F(P) - F(a). 

Finally, let [Xp be the Lebesgue extension of m, defined on the cr-algebra 
SP^ of ix F -measurable sets. In particular, SP^ contains all subintervals of 
[a, b ) and hence all Borel subsets of [a, b). Then \i :F is called the (Lebesgue-) 
Stieltjes measure corresponding to the function F, and the function F itself 
is called the generating function of [). F . 

Example 1. The Stieltjes measure corresponding to the generating func¬ 
tion F(x) = x is just ordinary Lebesgue measure on the line. 


7 To avoid confusion, we omit “outer parentheses,” writing jx(a, (3) instead of p((a, (3)), 
and similarly in the rest of the formulas (1). Moreover, in m[a, (3], we allow the case 
a = [3. 
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Example 2. Let F be a jump function, with discontinuity points 
x 1; x 2 ,. .. , x n ,.. . and corresponding jumps h x , h 2 , . . Then 

every subset A <= [a, b) is p,p-measurable, with measure 

V-f(A) = 2 K- (2) 

x n eA 

In fact, according to (1), every single-element set {x n } has measure h„, and 
moreover it is clear that the measure of the complement of the set {xj, 
x 2 ,. .. , x n ,. . .} is zero. But then (2) holds, by the o-additivity of u. F . A 
Stieltjes measure ;i F of this type, generated by a jump function, is said to be 
discrete. 

Example 3. Let F be an absolutely continuous nondecreasing function on 
[a, b), with derivative / = F'. Then the Stieltjes measure \i F is defined on 
all Lebesgue-measurable subsets A c; [a, b) and 

M- 4 ) = j A fix) dx. (3) 

In fact, by Theorem 6, p. 340, 

M«, P) = m~ Fix) = f fix) dx (4) 

for every open interval (a, P). But then (3) holds for every Lebesgue- 
measurable set A <=■ [a , b) since a Lebesgue extension of a c-additive measure 
is uniquely determined by its values on the original semiring. 8 A Stieltjes 
measure \i F of this type, with an absolutely continuous generating function, 
is itself said to be absolutely continuous. 

Example 4. Let F be singular (and continuous) as on p. 341. Then the 
corresponding Stieltjes measure \i F is concentrated on the set of Lebesgue 
measure zero where the derivative F' is nonzero or fails to exist. A Stieltjes 
measure of this type is said to be singular. 

Example 5. By the Lebesgue decomposition (p. 341), an arbitrary 
generating function F can be represented as a sum 

F(x) = D(x) + A(x) + S(x) (5) 

of a jump function D, an absolutely continuous function A and a singular 
function S (verify that D, A and S are themselves generating functions). 
Moreover, each of the “components” D, A and S is uniquely determined to 
within an additive constant (see Problem 4, p. 342). But clearly 

Af = Ad + Ai + Ps- 


8 Give a more detailed argument, recalling Problem 1, p. 279. Note that in this case 
m(a, (3) = nj[a, (3] = m(a, |3] = m[cc, (3). 



364 MORE ON INTEGRATION 


THE STIELTJES INTEGRAL 365 


CHAP. 10 

It follows that an arbitrary Lebesgue-Stieltjes measure can be represented 
as a sum of a discrete measure ]x D , an absolutely continuous measure [i. A and 
a singular measure \i s . Moreover, this representation is unique (why?). 

Remark. We can easily extend the notion of a Stieltjes measure on a 
(finite) interval [a, b) to that of a Stieltjes measure on the whole line (— oo, oo). 
Let .Fbe a bounded nondecreasing function on (— 00 , 00 ), so that 

m < F(x ) < M (~oo < x < 00 ). 

Using the formulas (1) to define the measure of arbitrary intervals (open, 
closed or half-open), not just subintervals of a fixed interval [a, b), we get a 
finite measure \i F on the whole line, called a ( Lebesgue -) Stieltjes measure, 
as before. In particular, we have 

p.(—co, co) = F(c o) — F(— 00 ) 

for the measure of the whole line, where 

F(co) = lim F(x), F(— co) = lim F(x ) 

#-►00 a->—00 

(the existence of the limits follows from the fact that F is bounded and 
monotonic). 

36.2. The Lebesgue-Stieltjes integral. Let (i F be a Stieltjes measure on 
the interval [a, b), corresponding to the generating function F, and let/be 
a [Xp-summable function. Then by the Lebesgue-Stieltjes integral off (with 
respect to F), denoted by 

/VW dF(x), (6) 

we simply mean the Lebesgue integral 

Lj (x)dp - F - 

Example 1. Let F be the jump function 
F(x) = 2 h n , 

x n <x 

so that y. F is a discrete measure. Then (6) reduces to the sum 

2f(x n )h n . 

n 

Example 2. If F is absolutely continuous, then 

JVM dF(x) = \ b f(x)F'(x) dx, (7) 
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where the right-hand side is the integral of fF' with respect to ordinary 
Lebesgue measure on the line. In the case where f(x) == const, this is an 
immediate consequence of (4). Moreover, by the c-additivity of integrals, 
(7) can be extended to the case of any simple function / which is u. F - 
summable. More generally, let {/„} be a sequence of such simple functions 
converging uniformly to /, so that {f n F'} converges uniformly to fF'. It can 
be assumed without loss of generality that 

AC*) < A(*) < • • • < /„(*) < • • •, 

and hence that 

jf x)F(x) < f 2 (x)F(x) < • ■ • < f n (x)F(x) < • • • . 

Therefore, applying Levi’s theorem (Theorem 2, p. 305) to both sequences 
(A,) and {f n F'}, we get 

f 6 /(x) dF(x) = lim J "/«(*) dF(x) = lim \ b f n (x)F’{x) dx = \ b f(x)F'(x) dx. 

n->oo Ja n-+ oo Ja 

Example 3. Suppose 

F(x) = D(x) + A(x), 
where D is the jump function 

D(x) = 2 h n 

X n <X 

and A is absolutely continuous. Then it follows from Examples 1 and 2 that 
P/00 dF ( x ) = 2f( x n)h n + \ b f{x)A'{x) dx. 

•la n Ja 

In the case where F also contains a singular component, as in (5), there is no 
such representation of the Lebesgue-Stieltjes integral (6) as the sum of a series 
and an ordinary Lebesgue integral. 

Remark. We can easily extend the notion of a Lebesgue-Stieltjes integral 
with respect to a nondecreasing function F to that of a Lebesgue-Stieltjes 
integral with respect to an arbitrary function of bounded variation <t>. In 
fact, as in Theorem 4, p. 331, let 

<D = v — g, 

where v, the total variation of ( I> on the interval [a, x], and g = v — are 
both nondecreasing. We then set 

/*/(*) = / VOO dv(x) - J b f(x) dg(x ) (8) 

da v a da 

by definition (see Problem 2). 

36.3. Applications to probability theory. The Lebesgue-Stieltjes integral 
is widely used in mathematical analysis and its applications. The concept 
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plays a particularly important role in probability theory. Given a random 
variable 5, 9 let 

F(x) = P{E < *}, 

i.e., let F(x) be the probability that E takes a value less than x. Then F is 
clearly nondecreasing and continuous from the left. Moreover, F satisfies 
the conditions 

F(—co) = 0, A(oo) = 1 

(why?). Conversely, every such function f can be represented as the prob¬ 
ability distribution of some random variable E- 

Two basic numerical characteristics of a random variable E are its 
mathematical expectation or mean (value) 

P 00 

n = J_ M X dF(x), (9) 

and variance 

m = IZ (x - E ^ 2 dF(x > (i°) 

(however, see Problem 5). 

Example 1. A random variable E is said to be discrete if it can take no 

more than countably many values x lt x„...,x n . For example, the 

number of calls received on a given telephone line during a given time 
interval is a discrete random variable. Let 

p n = P{E = x n } (n = 1,2,...) 

be the probability of the random variable E taking the value x„. Then the 
distribution function of E is just the jump function 

F(x) = J P„. 

X n <X 

In this case, the integrals (9) and (10) for the mean and variance of E reduce 
to the sums 

E ? = 2 x„p „, 

n 

D 5 = 2 (*«-■«)*?• (fl = E?). 

Example 2. A random variable E is said to be continuous if its distribu¬ 
tion function A is absolutely continuous. The derivative 

_ P(x) = F’(x) 

9 We presuppose familiarity with the rudiments of probability theory. See e.g., Y. A. 
Rozanov, Introductory Probability Theory (translated by R. A. Silverman), Prentice-Hall' 
Inc., Englewood Cliffs, N.J. (1969). 
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of the distribution function is then called the probability density of E- It 
follows from Example 2, p. 364 that in this case the integrals (9) and (10) 
for the mean and variance of E reduce to the following integrals with respect 
to ordinary Lebesgue measure on the line: 

EE = f °° xp(x) dx, 

J —00 

DE = f 00 (x — a) i p(x ) dx (a = EE). 

36.4. The Riemann-Stieltjes integral. Besides the Lebesgue-Stieltjes inte¬ 
gral introduced in Sec. 36.2 (which is in effect nothing but the difference 
between two ordinary Lebesgue integrals with respect to two measures on the 
real line 10 ), we can also introduce the Riemann-Stieltjes integral, defined 
as a limit of certain approximating sums, analogous to those used to define 
the ordinary Riemann integral. To this end, let/and ( J> be two functions on 
[a, b], where <t> is of bounded variation and continuous from the left, and let 

a — „y 0 < Xj < x 2 < • ■ • < x n = b 

be a partition of the interval [a, b] by points of subdivision x 0 , x lt x 2 ,. . . , 
x n . Choosing an arbitrary point Etc in each subinterval [x t _j, x k ], we form 
the sum 

i /&) m Xk ) — <i)(* M )]. (ii) 

)t=i 

Suppose that as the partition is “refined,” i.e., as the quantity 

max {*! - x 0 , x 2 - x u . . . , x n — x n _ t } (12) 

(equal to the maximum length of the subintervals) approaches zero, the sum 
(11) approaches a limit independent of the choice of both the points of 
subdivision x k and the “intermediate points” Etc Then this limit is called 
the Riemann-Stieltjes integral of/ with respect to <J>, and is denoted by 

j b J(x) dWx) 

(just as in the case of the Lebesgue-Stieltjes integral). 

Remark. If ® + ® 2 , then 

\ b f(x) dO(x) = I” fix) dVfx) + fV(x) d$ 2 (x) (13) 

Ja Jo, Ja 

(provided the integrals on the right exist). In fact, we need only write the 


10 Recall formula (8). 





368 MORE ON INTEGRATION 


CHAP. 10 


identity 

i /cum**) - «&(**_!)] 

= i + i /(^)[o 2 (x t ) - 

ft=i 

and then pass to the limit as the quantity (12) approaches zero. 

Theorem 1. If f is continuous on [ a,b ], then its Riemann-Stieltjes 
integral exists and coincides with its Lebesgue-Stieltjes integral. 

Proof. The sum (11) can be regarded as the Lebesgue-Stieltjes integral 
of the step function 

f n (x) = if x k _ t < x <x k (Jc=l,...,n). 

As the partition of [a, b] is refined, the sequence {/„} converges uniformly 
to / (why?). Hence, by the very definition of the Lebesgue integral 
(recall p. 294), 

lim \ h f n (x) dx = /, 

*>-00 Ja 

where 1 is the Lebesgue-Stieltjes integral of/over [a, b). But then 
lim jr f(x k )[<\>(x k ) - 0(x M )] = /, 

n~* oo i==l 

where the limit on the left is the Riemann-Stieltjes integral of / over 

[a, b]. 1 

Theorem 2. Iff is continuous on [a, b], then 

I /”/(*) d<I) W [ < K(f) max |/(x)|, (14) 

where Vff>) is the total variation of on [a, b]. 

Proof. The inequality 

i /cy[$(**) - <&(**-!>] < i i/(y i mx k ) - <E»(x i _ 1 )i 

fc=l Jc =1 

< max |/(x)|2|1 ) (x i ) - d>(x„_ 1 )| < Vff) max |/(x)| 

£=1 a^x^b 

holds for any partition of the interval [a, b]. Taking the limit of the 
left-hand side as max [x l — x 0 ,.. . , x n — x„_j} -* 0, we get (14). | 

Remark. If O(x) = x, (14) reduces to the familiar estimate 

I ) b f(x) dx I < (b — a) max |/(x)| 
for the ordinary Riemann integral. 
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Theorem 3. Let €> be a function of bounded variation on [a, b], different 

from zero at no more than countably many points c 1 , c 2 . c n ,... in 

( a , b). Then 

P/(x) d<t>(x) = 0 (15) 

da 

for any function f continuous on [a, b\. 

Proof. The assertion is obvious if ( l> is nonzero at only a single point 
c t e (a, b), since then 

2 f(x k )[<J>(x k ) - 0(xj_!)] = 0 

*= i 

for an “arbitrarily fine” partition 

a = x 0 < x x < • ■ ■ < x n ~ b. 


i.e., a partition for which the quantity (12) is arbitrarily small, provided 
we make sure that c k is not one of the points of subdivision x 0 , x lf ... , 
x„. u Hence, by (13), the assertion is also true if ® is nonzero at only 
finitely many points in (a, b). Now suppose <3> is different from zero at 
countably many points 

^2 . c n< • • • 

in (a, b), and let 

Tn = ®(cj- 


Then 


2 \y«\ < a). 


n~l 


since O is of bounded variation. Given any e > 0, we choose N such that 


and write ® in the form 


00 


2 I yJ < s, 

7I=iV+l 


o = + ®*, 


(16) 


where <I>, V takes the values y u ... ,y N at the points c u ...,c N and is 
zero elsewhere, while ®* takes the values y Ar+1 ,y iv+2 , ... at the points 
c n+i> c n+ 2 > • • • an d is zero elsewhere. Then, as just shown, 

j b f(x) d<b N (x) = 0. (17) 

Moreover “ 

nt oo 

2 /(Q[d>*(x*) - ®*(**-i) <2 M2 \yj < 2 Me, 

fc=l n=N+l 


11 Note that here we rely on the fact that c x is not an end point of \a , b]. 
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M = max \f(x)l 


J* a 7(*) d<b*(x ) 


< 2 Me 


after taking the limit as m -*■ oo. This in turn implies 


VM = 0, 

»>a 


since £ > 0 is arbitrary. Formula (15) now follows at once from (13) 
and (16)-(18). | 

36.5. Helly’s theorems. In Sec. 30.1 we found conditions insuring the 
validity of passing to the limit in Lebesgue integrals, i.e., conditions under 
which . . 

lim L f«(x) dy. = f(x) dy., (19) 

n-» oo 

where {/„} is a sequence of functions converging (almost everywhere) to a 
function/and the integrals are all with respect to a fixed measure p. In 
the case of Stieltjes integrals, we now ask a closely related but somewhat 
different question: Under what conditions does the formula 


lim j b fix) dO n (x) = j" f(x) dd>(x) 

n-+ oo J(l •'a 


hold, where/is continuous and {<£„} is a sequence of functions of bounded 
variation converging (everywhere) to a function (Note that here, unlike 
(f9), the function/is fixed, and it is the function dt n , or the corresponding 
Stieltjes measure, which varies.) The answer to this question is given by 

Theorem 4 iHelly’s convergence theorem). Let {<!>„} be a sequence of 
functions of bounded variation on [a, b], converging to a function <f> at every 
point of [a, b]. Suppose the sequence of total variations { K/TJ} is 
bounded, so that 

Ki®n) < C in = 1,2,...) (21) 

for some constant C > 0. Then <D is also of bounded variation on [a, b], 
and (20) holds for every function f continuous on [a, b}. 


Proof. Let 


a = x 0 < x t < ■ ■ • < x m = b 


be any partition of the interval [a, b] by points of subdivision x 0 , x 1; . 
x m . Then 

m m 

lmx k ) - < I ) (.x A: _ 1 )| = lim 2 |T„(x, £ ) - d\ix k __f\ < C, 

k 1 n-> co k - l 
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and hence 

k;(®)'< C, (22) 

i.e., <E> is of bounded variation on [a, b ], as asserted. 

Next we show that (20) holds if/is a step function. Suppose 

fix) = h k if x 4 _! < x < x k . 

Then 

P fix) d4>„(x) = 2 h k [® n ix k ) - <JUO*-i)] (23) 

Ja lc 

and 12 

S" fix) d®ix) = 2 h k mx k ) - <>(**_,)], (24) 

k 

where obviously (23) approaches (24) as;; oo. Now let /be continuous 
on [a, b ]. Given any e > 0, choose a step function/ such that 

[/(*) -/«(*)! < ia < x < b) (25) 

(why is this possible?). Then 

| £ fix) d<t>ix) - J7(x) d<F„(x) | < |/j| + |/,| + |/ 3 |, (26) 

where 

h = /VW da, ( x ) - d$ix), 

da da 

h = J7.W <*«>(*) ~ f7.W d ®n(x), 

da da 

h = / 7.00 d ®n(x) ~ f fix) d® n ix). 

da J a 

By the inequality (14), which clearly holds for Lebesgue-Stieltjes integrals 
as well as for Riemann-Stieltjes integrals (why?), we have 

k.i < fVw -fix )i d<s>i X ) < 7 vim < f, 

Ja 3 C 3 

(27) 

|7 3 | < Pl/sW -/Ml d<b n ix) < vim) < \ , 

3C 3 

after using (21), (22) and (25). Moreover, as just shown, 

I/.! < | (28) 

la Think of (23) and (24) as Lebesgue-Stieltjes integrals. 




372 MORE ON INTEGRATION 


CHAP. 10 


for sufficiently large n. It follows from (26)—(28) that 

| / VW d<£> (x) - j b f(x) d<$> (x) I < s, 

1 •'a Ja I 

which implies ( 20 ), since e > 0 is arbitrary. g 

Theorem 1 gives conditions under which we can take the limit of a se¬ 
quence {4> n } of functions of bounded variation inside a Stieltjes integral. 
The next theorem gives conditions guaranteeing the existence of a sequence 
{<!>„} meeting the requirements of Theorem 4. 

Theorem 5 ( Belly’s selection principle). Let ® be a family of functions 
defined on an interval [a, b] and satisfying the conditions 

K(?) < c » sup |cp(x)| < M (29) 

for suitable C and M. Then O contains a sequence which converges for 
every x e [a, b]. 


Proof It is enough to prove the theorem for nondecreasing functions. 
In fact, let 


9 = v - g, 


where v is the total variation of 9 on [a, x]. Then the functions v corre¬ 
sponding to all 9 e <D are nondecreasing and satisfy the conditions of 
the theorem, since 

K( v ) = K( 9 ) < c , sup [u(x)| < C. 

Assuming that the theorem holds for nondecreasing functions, we choose 
a sequence {ep„} from such that v n converges to a limit v* on [a, b ]. 
Then the functions 

gn = Vn - <P« 

are also nondecreasing and satisfy the conditions of the theorem (why?). 
Therefore { 9 J contains a subsequence {<p n J such that {g n J converges 
to a limit g* on [a, b]. But then 

lim 9 n fx) = 9 *(x), 

n~*co 

where 

9 *(x) = P*(x) — g*(x). 

Thus we now proceed to prove the theorem for nondecreasing 
functions. Let r u r 2 ,. . . , r n ,. . . be the rational points of [a, b]. It 
follows from (29) that the set of numbers 

9 (r a ) (9 e <D) 
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is bounded. Hence there is a sequence of functions { 9 ^} converging at 
the point r y . Similarly, {'ffi} contains a subsequence {ffi) converging 
at the point r 2 as well as at r u {ffi} contains a subsequence { 9 ™} 
converging at the point r 3 as well as at r y and r 2 , and so on. The “diagonal 
sequence” 

m = 

will then converge at every rational point of [a, b]. The limit of this 
sequence is a nondecreasing function t|i, defined only at the points 
r i> r 2 > • ■ ■ > r n -< ■ • ■ ■ We complete the definition of iji at the remaining 
points of [a, b] by setting 

<p(x) = lim (Ji(r) if x is irrational. 

r-*x —0 
r rational 

The resulting function iji is then the limit of {’f n } at every continuity 
point of (p. In fact, let x* be such a point. Then, given any s > 0, there 
is a S > 0 such that 

l+(**) - +001 <7 (30) 


lx* — x\ < S. 

Let r and r' be rational numbers such that 

x* — S < r' < x* < r" < x* + 8 , 
and let n be so large that 

I W r ') - < 7 , I<k(r") - ^(r")| < - . 

o 6 

It follows from (30) and (31) that 


I+„(»■') - Ur”) I < j . 

Since is a nondecreasing function, we have 

i>n(r’) < i> n (x*) < <p n (r”), 

and hence 

I'Kx*) - i(x*)| < |<Kx*) - iKr')| + |<KO - <Ji„(r')| 


Therefore 


+ \'h(r’) - 'K(x*)| <-5 + - + ^ = e . 

6 6 3 


lim <p n {x*) = ^(x*), 


since s > 0 is arbitrary. 
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Thus we have constructed a sequence {f„} of functions in <I> con¬ 
verging to a limit function everywhere except possibly at discontinuity 
points of v- Since there are no more than countably many such points 
(why?), we can again use the “diagonal process” to find a subsequence 
of which converges at these points as well, and hence converges 
everywhere on [a, b], | 

36.6. The Riesz representation theorem. Next we show how Stieltjes 
integrals can be used to represent the general linear functional on the space 
Qa, 6] of all functions continuous on the interval [a, b]: 

•Theorem 6 (F. Riesz). Every continuous linear functional 9 on the 
space C la i6] can be represented in the form 

<?(f ) = f /(*) dt>(x), (32) 

where ® is a function of bounded variation on [a, b ], and moreover 

M = W). (33) 

Proof. The space C [a , 6] can be regarded as a subspace of the space 
M ta-S ] of all bounded functions on [a, b], with the same norm 

ll/ll = sup |/(x)| 

as in C laM . Let 9 be a continuous linear functional on C ta , 6] . By the 
Hahn-Banach theorem (Theorem 5, p. 180), 9 can be extended without 
changing its norm from C [a>M onto the whole space M U bV In particular, 
this extended functional will be defined on all functions of the form 

11 if x < t, 

/,W = (a < T < b). (34) 

(0 if X > T 

Let 

®(t) = 9(/t)- (35) 

Then ® is of bounded variation on [a, b). In fact, given any partition 
a = x 0 < < • • • < x n = b (36) 

of [a, b], let 

a * = sgn [<f>(x t ) - <5 (aVi)] (fc = 1 ,...,«), 


where 


'1 if x > 0, 

sgn x = { 0 if x = 0, 

, — 1 if x < 0. 
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Then 

IWx,) - i>(x,_ 1 )| = i«*[®(x*) - a>(x M )] 

*=1 *■=! 

<11911 2^(.fx k ■ 

k~l 

But the function 

n 

2 a fcC/aJk f«k- 1) 

fr-1 

can only take the values 0, ±1, and hence its norm equals 1 . Therefore 

||0(xj-0(^)1 < || 9 ||. 

k= 1 

Since this is true for any partition of [a, b], we have 

*1(<I>)< Ml, (37) 

i.e., is of bounded variation on [a, b ], as asserted. 

We now show that the functional 9 can be represented in the form of a 
Stieltjes integral with respect to the function <b just constructed. Let / 
be any function continuous on [a, b]. Given any e > 0, let S > 0 be 
such that |x' - x"\ < 8 implies |/(x') - f(x")\ < s. Suppose the 
partition (36) is such that each subinterval [x H ,'x t ] is of length less than 
S, and consider the step function 

f U \x) =/(x*) if x s _! < x < x k (k = l,...,n), 
which can obviously be written in the form 

f U) (x) = I/(WJx) ~f Xt _fx)], (38) 

Jc= 1 

where f. is the function defined by (34). Clearly, 

I f{x) - / {s, (x)| < e 

for all xe [a,b ], 13 i.e., 

II/-/ (E) || <e. (39) 

It follows from (35) and (38) that 

?(/ <E) ) = If(x k )W(f Xk ) - <?(f Xk J] = tf(x k )mx k ) - $(x H )], 

13 We complete the definition of/<*> by setting/('>(6) =/(x„) =/(6) for every e > 0. 
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i.e., <p(/ (e) ) is an “approximating sum” of the Riemann-Stieltjes integral 

f/(*) d<l>(x). 

Therefore 

I ?(/ <El ) - JVw I <« 

I Jfll I 

for a “sufficiently fine” partition of the interval [a, b], On the other 
hand, 

l<P(/) - <P(/' E) )I < II91! II/ —/ <e) l! < II9lie 
because of (39). But then 

| 9(f) ~ j b f( x ) d<S >( x ) I < (II9II + l)e, 

which implies (32), since e > 0 is arbitrary. To prove (33), we merely 
combine (37) with the opposite inequality 

«9 ii < vim, 

which is an immediate consequence of Theorem 2 and the representation 
(32). I 

Problem 1. Let p, be an arbitrary finite a-additive measure on the real 
line (—oo, co). Represent p as the Stieltjes measure corresponding to some 
generating function F. 

Hint. Let F(x) = p(— oo, x). 

Comment. Thus the term “Stieltjes measure” does not refer to a special 
kind of measure, but rather to a special way of constructing a measure (by 
using a generating function). 

Problem 2. Let <t> be a function of bounded variation with two distinct 
representations O = v — g, ® = v* — g* in terms of nondecreasing functions 
v, g, v* and g* (give an example). Prove that 

\ b f(x) dv(x) - p/(x) dg(x ) = p/W dv*(x) - p/(x) dg*(x). 

*>a Ja 

Comment. Thus in the definition ( 8 ) of the Lebesgue-Stieltjes integral 
with respect to a function of bounded variation ®, the particular representa¬ 
tion of as a difference between two nondecreasing functions does not 
matter, i.e., v need not be the total variation of ® on [a, x]. 


Problem 3. Let E be the number of spots obtained in throwing an unbiased 
die. Find the mean and variance of E- 

Ans. EE, = i, DE = ff. * 

Problem 4. Find the mean and variance of the random variable E with 
probability density 

p(x) = le -1 * 1 (—oo < x < oo). 

Problem 5. Let E be the random variable with probability density 

p(x) = 1 g. (-00 < X < oo), 

7c(l + X) 

Prove that EE and DE fail to exist. 

Problem 6. Discuss random variables which are neither discrete nor 
continuous. 

Problem 7. Given a random variable E with distribution function F, 
consider the new random variable v\ — 9(E), where 9 is a function summable 
with respect to the Stieltjes measure p F generated by F. Express EE and 
DE in terms of F. 

Hint. Consider the problem of changing variables in a Lebesgue integral. 
Ans. For example, EE = f°° 9 {x) dF{x). 

Problem 8. Prove that if / is continuous on [a, b ], then the Riemann- 
Stieltjes integral 

P/W d<Wx) (40) 

does not depend on the values taken by <3> at its discontinuity points in (a, b). 
Hint. Use Theorem 3 and formula (13). 

Comment. Hence if/ is continuous, we need not insist that <J> be con¬ 
tinuous from the left at its discontinuity points in (a, b). In fact, can be 
assigned arbitrary values at these points. 

Problem 9. Write formulas for the Riemann-Stieltjes integral (40) in the 
case where f is continuous and 

a) $ is a jump function; 

b) is an absolutely continuous function with a Riemann-integrable 
derivative. 
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Problem 10. Evaluate the following Riemann-Stieltjes integrals: 


if x = — 1 , 
if —1 < x <2, 
if 2 < x < 3; 
if 0 < x < J, 
if i < x < 
if x = f, 
if f < x < 2 ; 
if 0 < x < |, 

if | x < 1 . 

Problem 11. Develop a theory of Riemann-Stieltjes integration on the 
whole real line (— oo, oo). 

Problem 12. Extend Theorem 4 to the case where a — — co or & = oo 
(or both), assuming that /(x) approaches a limit as x -> ±co. 

Problem 13. Let {$„} be the same as in Theorem 4, and let {/„} be a 
sequence of continuous functions on [a, b] converging uniformly to a limit/. 
Prove that 

lim j b f n (x) d< D„(x) = p/(x) d<t>(x). 

n->°o Ja Ja 

Problem 14. Prove that there is a one-to-one correspondence between 
the set of all continuous linear functionals 9 on C laM and the space V" aM 
of Problem 8 , p. 332, provided we identify any two elements of ^ which 
coincide at all their continuity points. Prove that the inequality 

Pa($) <11911 

need not hold for every <f> 6 V° i b] corresponding to a given functional 
9 6 C [a , 6] , but that there is always at least one such element O for which 
the inequality holds. 


a) J* x clF(x), where F(x) = 


b) jjx 2 dF(x), where F(x') = 


c) J 1 x 2 dF(x), where F(x) 


( 0 
1 

-1 
-I 
0 
2 
■-2 
x 

1 

X 


37. The Spaces L x and L z 


f summable on X (however, see Problem 1). Clearly L, is a linear space 
(with addition of functions and multiplication of functions by numbers 
defined in the usual way), since a linear combination of summable functions 
is again a summable function. To introduce a norm in L x , we define 

ll/ll = / I/Ml d { x, ( 1 ) 

where, as in the rest of this section, the symbol J by itself denotes integration 
over the whole space X. Of the various properties of a norm (see p. 138), 
it follows at once from ( 1 ) that 

ll/ll > 0, 

Ik/ll = kl ll/ll. 

Il/i +/2II < li/ill + II/2II, 

and we need only verify that ||/|| = 0 if and only if/ = 0. To insure this, 
we agree to regard equivalent functions (i.e., functions differing only on 
a set of measure zero) as identical elements of the space L x . Thus the 
elements of L x are, to, be perfectly exact, classes of equivalent summable 
functions . 14 In particular, the zero element of L x is the class consisting of all 
functions vanishing almost everywhere. With this understanding, we will 
continue to talk (more casually) about “functions in L x .” 

In L x , as in any normed linear space, we can use the formula 

?<J,g)=\\f-g\\ 

to define a distance. Let {/„} be a sequence of functions in L x . Then {/„} 
is said to converge in the mean to a function/e L x if p(/„,/) —► 0 as n —co. 

Theorem 1 . The space L x is complete. 

Proof. Let {/„} be a Cauchy sequence in L x , so that 

ll/m -fn II ** 0 as m, n —> go. 

Then we can find a sequence of indices {n k } (where n x < n 2 < ■ • ■ < 
«*.<•••) such that 

II/** -/«*+! II = / l/**M — /«*+i(*)l d v-<~ le (fc = 1,2, .. .). 

It follows from the corollary to Levi’s theorem (see p. 307) that the series 

_ l/nj + l/« 2 ~ /nj + ‘ ’ 


37.1. Definition and basic properties of L x . Let X be a space equipped 
with a measure |x, where the measure of X itself may be either finite or 
infinite. Then by L x (X, u.), or simply L x , we mean the set of all real functions 


14 Thus the precise definition of addition of two elements <p,, <p a e L 1 is the following: 
Let/, and f 2 be “representatives” of 9 , and <p 2 , respectively, i.e., let/, e <Pi ,/ 2 6 <p 2 . Then 
<Pi + 9a is the class containing/ + / s (this class dearly does not depend on the particular 
choice of/, and fi). 
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converges almost everywhere on X. Therefore the series 

/«! +/« 2 — A + " ' 

also converges almost everywhere on X to some function 
/(*) =lim/ Bi (x). 

iC~* CO 

But {/„ } converges in the mean to the same function /. In fact, given 
any s > 0, 

/ !/»*(*) -/»,(*)I ^ < 6 (2) 

for sufficiently large k and l, since {/„} is a Cauchy sequence. Hence, 
by Fatou’s theorem (Theorem 3, p. 307), we can take the limit as /—»■ oo 
behind the integral sign in (2), obtaining 


J I/„*(*) ~f(x) I 4 1 


It follows that f eL 1 (why ?) and that f„ t ->/in the mean. But if a Cauchy 
sequence contains a subsequence converging to a limit, then the sequence 
itself must converge to the same limit. Hence /„->-/ in the mean. | 

According to the definition of the Lebesgue integral (see p. 296), given 
any function / summable on X and any s > 0, there is a summable simple 
function <p(x) such that 


I/O) - 9(*)l < s - 


Moreover, the Lebesgue integral of a summable simple function <p taking 
values y v y 2 , ■ ■ ■ on sets E x , E 2 ,. .. is defined as the sum of the series 

oo 

n =1 

(assumed to converge absolutely). Therefore every summable simple function 
can be represented as the limit in the mean (i.e., as the limit in the sense of 
convergence in the mean) of a sequence of summable simple functions, 
each taking only finitely many values. In fact, given any s > 0, let N be 
such that 


and let 15 


2 W V-( E n) < e, 
n= N +1 


if x e E k , l < k < N, 
otherwise. 


15 Note that tp^ is a finite linear"combination of characteristic functions, namely 
= yiXufx) h -h yx'/jifix) 


(see footnote 11, p. 349). 
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Then 

r CO 

J I<pO) ~ <PvO)| fix < J \y„\ ifiEJ < s. 

In other words, the set of all simple functions taking only finitely many values 
is everywhere dense in the space L x . 

Theorem 2. Let X be a metric space equipped with a measure u. such 
that 16 


1) Every open set and every closed set in X is measurable-, 

2) If a set M ^ X is measurable , then 

ifiM) = inf (x (G), (3) 

Meza 

where the greatest lower bound is taken over all open sets G <= X 
containing M. 

Then the set of all continuous functions on X is everywhere dense in 

LfiX, (x). 

Proof. We need only show that every simple function taking only 
finitely many values is the limit in the mean of a sequence of continuous 
functions. But every simple function taking only finitely many values is 
a finite linear combination of characteristic functions of measurable sets, 
and hence we need only show that every such characteristic function 
Xm( x ) is the limit in the mean of a sequence of continuous functions. 
If M c: Xis measurable, then (3) implies that given any s > 0, there is a 
closed set F u and an open set G M such that 

F M c M c G m , y.(G M ) - ij.(F m ) < e. (4) 

Now let 17 




if xeX— G m , 
if xe F m . 


Moreover, <p e is continuous, since p (F M , x) and p(X - G M , x) are both 
continuous functions, with a non vanishing sum. But \'i M — cpj does not 
exceed 1 on G M — F M , and vanishes outside this set. Using (4), we find that 

/ I Xat(x) ~ ?e(x)| d[j. < S. I 


16 These conditions are satisfied by ordinary Lebesgue measure in «-space, and in 
many other cases of practical interest. 

17 As usual, p (A, x) denotes the distance between the set A and the point * (see Problem 
9, p. 54). 
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The space L x (X, it) depends on the choice of both the space X and the 
measure fx. For example, E x (X, fx) is essentially a finite-dimensional space 
if jx is concentrated on a finite set of points (why?). In analysis, we are 
mainly interested in the case where L x is infinite-dimensional but has a 
countable everywhere dense subset. 18 To characterize such spaces, we 
introduce the following concept, stemming from general measure theory: 

Definition. Suppose a space X equipped with a measure ix has a 
countable system stf of measurable Subsets A x , A 2 , . . . such that given any 
e > 0 and any measurable subset M <= X, there is a set A k s srf satisfying 
the inequality 

[x(M A A k ) < e. 

Then ix is said to have a countable base, consisting of the sets A x , A 2 , . . . 

Example. Let [x be a Lebesgue extension of a measure m originally 
defined on a countable semiring 6P m . Then the ring is obviously 

itself countable, and hence, by Theorem 3, p. 277, is a countable base for (x. 
In particular, ordinary Lebesgue measure on the line has a countable base, 
since we can choose the original semiring 6P m to consist of all intervals (open, 
closed and half-open) with rational end points. 

Theorem 3. Let X be a space equipped with a measure fx, and suppose 
(X has a countable base A x , A z ,.... Then L x (X, p.) has a countable 
everywhere dense Subset. 

Proof. We will show that the set M of all finite linear combinations 
of the form 

f k (x), (5) 

k= 1 

where f k is the characteristic function of A k and the numbers c x ,... ,c n 
are rational, forms a countable everywhere dense subset of L x = L x (X, fx). 
The countability of M is obvious, and we need only show that M is 
everywhere dense in L x . As already noted, the set of all simple functions 
taking only finitely many values is everywhere dense in L 1 . But every such 
function can be approximated arbitrarily closely by a function of the same 
type taking only rational values. Hence we need only show that every 
function / taking rational values y x , . . . , y n on pairwise disjoint sets 
E lt ...,£„ (with X as their union) can be approximated arbitrarily 
closely in the L r metric by functions of the form (5). Clearly, there is 
no loss of generality in assuming that the base A x , A 2 ,... is closed under 
the operations of taking differences and forming finite unions and 
intersections (why?). 

18 So that L ! is separable, as defined on p. 48. 
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Now, according to the definition, given any s > 0, there are sets 
A x ,. .. , A n such that 

V-\.(E k A k ) U (A k E k )] < e (& = 1,...,«). 

Let 

A k — A k — U Aj (fc = 1,. . . , n ), 

J<k 

and define a function 

(y k if xeA' k , 

f * (x) = 0 if xeX~\jA’ k . 

\ jfc=l 

Then clearly 

[x{x:/(x)#/*(x)}, 
and hence the left-hand side of 

/ I/O) -/*0)l d\L < 2 (max \y n \)V-{xf{x) #/* (jc)}, 

n 

can be made arbitrarily small by choosing e > 0 sufficiently small. This 
proves the theorem, since/* is a function of the form (5). 1 


37.2. Definition and basic properties of L 2 . As we have seen, the space 
O = Lf X, jx) is a Banach space, i.e., a complete normed linear space. 
However, L x is not Euclidean, since its norm cannot be derived from any 
scalar product. This follows from the “parallelogram theorem” (Theorem 
15, p. 160). For example, if X = [0, 2tt] and ix is ordinary Lebesgue measure 
on the line, then the condition 


ii/+^ii 2 + ii/-^ii 2 = 2(ii/r-Hi^p) 

fails for the summable functions f(x) = 1, g(x) = sin x. 19 To get a function 
space which is not only a normed linear space but also a Euclidean space, 
we now consider the set of functions whose squares are summable. 

Thus let X be a space equipped with a measure fx, where we temporarily 
assume that [x(X) < co. Then by LfX, ix), or simply L 2 , we mean the set of 
all real functions / whose squares are summable on X, i.e., which satisfy 
the condition 

jf(x)d l x < OO 

(however, see Problem 6). As in the case of L x , we do not distinguish 
between equivalent functions (i.e., functions differing only on a set of 
measure zero). 


19 As an exercise, show that the same kind of counterexample works quite generally. 
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Theorem 4. If f and g belong to l, 2 , then so do of, / + g, andfg, where 
a is an arbitrary constant. In particular, L 2 is a linear space. 

Proof. Obviously a/e L 2 , since 

J [a fix)] 2 da — a 2 J f\x) dp < co. 

The fact that fg e L 2 follows from the inequality 

l/(*)g(x)| < M fix) + g 2 (x)] (6) 

and Theorem 3, p. 297. 20 But then/ + g e L 2 , since 

[fix) + g(x)] 2 < p(x) + 2 |/(x)g(x)| + g 2 (x), 
where each term on the right is summable. | 

Next we define a scalar product in L 2 , setting 

if, g) = J /(x)g(x) dp. 

This choice obviously has all the properties of a scalar product listed on 
p. 142: 

1) (/, f)> 0 where (/,/) = 0 if and only if/ = 0; 

2) if,g) = ig,f)‘, 

3) (}f,g) = Mf,g); 

4) if, gi + £ 2 ) = if, gi) + (/, £ 2 )- 

(In asserting that (/,/) = 0 if and only if / = 0, we rely on the fact that 
every function vanishing almost everywhere is identified with the zero element 
of L 2 .) Thus L 2 is a Euclidean space, with the norm defined by the usual 
formula 

11/11 = V(Z 7 ) (7) 

(recall Theorem 1, p. 142). In the case of L 2 , (7) takes the form 

ll/ll = JjPi*) 

By the same token, the distance between two elements fgeL 2 is just 
(/. g) = 11/ - gll = [fix) - g{x)f dp. 

The quantity 

J[/W - g(x)f dp = ||/- g|| 2 

is called the mean square deviation of the functions/and g (from each other). 

20 Setting g(x) = 1 in (6), we find that fe L 2 implies fe L, (provided that X is of finite 
measure). 


Let {/„} be a sequence of functions in L 2 . Then {/„} is said to converge in 
the mean square to a function f e L 2 if p(/„,/) 0 as n -> oo. 

In L 2 , as in any other Euclidean space, we have the Schwarz inequality 

\ifg)\ < ll/ll IlfII, 

which here takes the form 

| / f(x)g(x) dp\<Jj f\x) dp J j g\x ) dp. ( 8 ) 

The L 2 -version of the triangle inequality 


l/+fll < II /II + Ilf II 

is clearly 

J j [fix) + g(x)] 2 dp < J j f(x) dp + J j g 2 (x) dp. 

In particular, replacing/by |/| and setting g(x) = 1 in (8), we get 

| |/(x)| dp < s/p(K)Jjf\x) dp, (9) 


from which it is again apparent (cf. footnote 20) that/el 2 implies f e L x 
if p(X) < oo. 

Theorem 5. The space L 2 is complete. 

Proof. Let {/„} be a Cauchy sequence in L 2 , so that 
ll/m-/nil -+0 as m,n~* oo. 

Then, by (9), given any e > 0, we have 

j I fm(x) ~fnix) I dp < V P(X) j j [fjx) -f n (x)fdp < S-Jp(X) 

for sufficiently large m and n, i.e., {/„} is also a Cauchy sequence in the 
Li-metric. Repeating the argument given in the proof of the completeness 
of L x , we choose a subsequence {/„/ from {/„} converging almost 
everywhere to some function/. Clearly, given any e > 0, we have 

\[fnfx)-f ni (x)fdp<e (10) 

for sufficiently large k and /. Hence, by Fatou’s theorem (Theorem 3, 
p. 307), we can take the limit as /-> <x> behind the integral sign in (10), 
obtaining 

j[fn k (x)-f(x)f dp<e. 
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It follows that/el s (why ?) and that f n% ->/in the mean square. But if 
a Cauchy sequence contains a subsequence converging to a limit, then the 
sequence itself must converge to the same limit. Hence /„->■/ in the 
mean square. | 

We now drop the restriction p(X) < oo, allowing X to have infinite 
measure. In the case p(X) = co, it is no longer true that / e L 2 implies 
f e L x , a fact deduced from (6) or (9) in the case p(X) < co. For example, 
let X be the real line equipped with ordinary Lebesgue measure, and let 

f{x) = -==. 


Then /belongs to L 2 but not to L u since 




dx 

1 + X 2 


= n < oo. 


Moreover, if a sequence {/„} converges to a limit / in the L 2 -metric, it 
follows from (9) that {/„} also converges to/in the Li-metric if p(X) < oo. 
However, this conclusion fails if p(X) = co, as shown by the example 


if |x| < n, 


/„(*) = « 


if |x| > n, 


where {/„} approaches no limit in L x but approaches the zero function in L 2 
(give the details). Despite all this, we have 21 

Theorem 5'. The space L 2 is complete even if p(X) — co, provided 
that [x is a-finite. 

Proof. As in Sec. 30.2, let 

X = U X n , y.(XJ < co, 


Moreover, given any function <p on X, let 

f(x) if x £ X n , 

<p(«)( x ) = 

|p if x f X n , 


31 Note that in the proof of the completeness of L, (Theorem 1), X can have either 
finite or infinite measure. 
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so that 

/<pO) dp = J <p(x) dp = lim J <p(x) dp = lim f cp (n) (x) dp, 

■X- ra-^-oo J n-KX) ^ 

if <p is summable on X. Let {/„} be a Cauchy sequence in L 2 , so that, 
given any s > 0, 

/ ifkix)-fix)] 2 dp < s 
for all sufficiently large k and /. Then 

J ~f in) (x)T dp = / lf k (x) - ffx)] 2 dp < «, 

n-» oo 

and hence, a fortiori, 

( 11 ) 

But L 2 (Z„, u) is complete, by Theorem 5, since p)X n ) < oo. Therefore 
{/x U> } converges in the metric of LfX n , q) to a function/'"’ e LfX n , p). 
Taking the limit as /-a- co behind the integral sign in (11), we get 

( 12 ) 

-A. n 

(why is this justified?). Since (12) holds for every n, we can now take 
the limit as n -*■ oo, obtaining 

lim f lfi n \x) ~f M ix)f dp < e. (13) 

n -* oo n 

Now let 

fix) =/ (n) (x) if xeX n . 

Then (13) implies 

/ [fkix) -fix)] 2 dp < s. 

It follows that/e Lf X, p.) and f k ->/in the mean square. 0 

Problem 1. A complex function is said to be summable if its real and 
imaginary parts are summable. Show that the considerations of Sec. 37.1 
carry over verbatim to the case where L x consists of all complex summable 
functions (defined on X). 

Problem 2. Prove that if each of the measures p x and p 2 has a countable 
base, then so does their direct product p = p x x p 2 . 

Comment. In particular, Lebesgue measure in the plane (or more 
generally in n-space) has a countable base. 
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Problem 3. Let X be the interval [a, b ], and let p. be ordinary Lebesgue 
measure on the line. Prove that the set 0 of all polynomials on [a, b] with 
rational coefficients is everywhere dense in LJX, ;x). 

Hint. Use Theorem 2 and the fact that every function continuous on 
[a, b] can be approximated in the mean (or even uniformly) by elements of 0. 

Problem 4. Prove that LJX, fx) is separable, i.e., has a countable every¬ 
where dense subset, if ix has a countable base. 

Comment. Thus LJX, p.) is a Hilbert space if fx has a countable base 
(we disregard the case where LJX, (x) is finite-dimensional). It follows from 
Theorem 11, p. 155 that all such spaces are isomorphic, in particular, that 
LJX, [x) is isomorphic to the space 4 of all sequences (x u x 2 ,... , x n ,. ..) 
such that 

2 *n < 
n=l 

(in fact, 4 corresponds to the case where the measure fx is concentrated on a 
countable set of points). 

Problem 5. Prove that every continuous linear functional <p on LJX, jx), 
where p has a countable base, can be represented in the form 

?(/) = jf(x)g(x) d[x, 
where g is a fixed element of L.JX, p,). 

Hint. Recall Theorem 2, p. 188. 

Problem 6. Show that the considerations of Sec. 37.2 carry over verbatim 
to the case where L 2 consists of all complex functions/satisfying the condition 

J !/(x)| 2 d;x < °o, 

provided the scalar product of two such functions / and g is now defined as 

(/» g) = J/WgW d[i. 

Show that the resulting space L 2 is a complex Hilbert space if the measure (x 
has a countable base (again disregard the finite-dimensional case). 

Problem 7. Let {/„} be a sequence of functions defined on a space X 
equipped with a measure jx such that a(X) < co. Prove that 

a) If {/„} converges uniformly, then {/„} converges in the mean and in 
the mean square; 

b) If {/„} converges in the mean or in the mean square, then {/„} con¬ 
verges in measure (as defined in Problem 6, p. 292); 


c) If {/„} converges in the mean or in the mean square, then {/„} contains 
a subsequence {/„ } which converges almost everywhere. 

Hint. See Problem 9, p. 292. Alternatively, recall the proof of Theorem 1. 

Problem 8. Prove that the sequence of functions constructed in Problem 
8, p. 292 converges to f(x) = 0 in the mean and in the mean square, without 
converging at a single point. 

Problem 9. Give an example of a sequence of functions {/„} which con¬ 
verges everywhere on [0, 1], but does not converge in the mean. 

Hint. Let 

(n if x e (0, 1 In), 

fn(x) = , . 

(0 otherwise. 

Problem 10. Give an example of a sequence of functions {/„} which 
converges uniformly, but does not converge in the mean or in the mean 
square. 

Hint. According to Problem 7a, we must have fx(X) = 00 . Let 

(-4= if |x| < n, 

fn(x) = Un 

1 0 if |x| > n. 

Problem 11. Show that convergence in the mean need not imply con¬ 
vergence in the mean square, whether or not [x(2f) < 00 . 

Problem 12. Let LJX, p) be the set of all classes of equivalent (real or 
complex) functions / such that 

J I/I" dp < 00 (1 < p < co), 

equipped with the norm 

11/11 = (J i/r dij ,P 

Prove that LJX, fx) is a Banach space. 
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INDEX 


A 

Absolutely continuous charge, 347 
Absolutely continuous function, 336 
Absolutely summable sequence, 185 
Adjoint operator, 232 
in Hilbert space, 234 
Aleph null, 16 
Alexandroff, P. S., 90, 97 
Algebra of sets, 31 
Algebraic dimension, 128 
Algebraic number, 19 
Almost everywhere, 288 
Angle between vectors, 143 
Arzela’s theorem, 102 
generalization of, 107 
Axiom of choice, 27 
Axiom of countability: 
first, 93 
second, 82 
Axiom of separation: 
first, 85 
Hausdorff, 85 
second, 85 

B 

Baire’s theorem, 61 
B-algebra (see Borel algebra) 
Banach, S., 138, 229, 238 
Banach space, 140 
Base, 81 
countable, 382 
neighborhood (local), 83 
Basis, 121 
dual, 185 


Basis ( cont .): 

Hamel, 128 
orthogonal, 143 
orthonormal, 143 
Bessel’s inequality, 150, 165 
Bicompactum, 96 
Binary relation (see Relation) 

Birkhoff, G„ 28 

Bolzano-Weierstrass theorem, 101 
Borel algebra, 35 
irreducible, 36 
minimal, 36 
Borel closure, 36 
Borel sets, 36 

Bounded linear functional, 177 
norm of, 177 

Bounded real function, 110 
Bounded set, 65, 141, 169 
locally, 169 
strongly, 197 
weakly, 197 
B-set (see Borel set) 

C 

Cantor, G., 29 
Cantor function, 335 
Cantor set, 52 

points of the first kind of, 53 
points of the second kind of, 53 
Cantor-Bernstein theorem, 17 
Cardinal number, 24 
Cartesian product (see Direct product) 
Cauchy criterion, 56 
Cauchy sequence, 56 
Cauchy-Schwarz inequality, 38 
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Chain, 28 
maximal, 28 

Characteristic function, 349 
Charge, 344 

absolutely continuous, 347 
concentrated, on a set, 346 
continuous, 346 
density of, 350 
discrete, 347 
negative, 344 
negative variation of, 346 
positive, 344 
positive variation of, 346 
Radon-Nikodym derivative of, 350 
singular, 347 
total variation of, 346 
Chebyshev’s inequality, 299 
Choice function, 27 
Classes, 6 
equivalence, 8 

Closed ball (see Closed sphere) 

Closed graph theorem, 238 
Closed set(s), 49 
in a topological space, 79 
on the real line, 51 
unions and intersections of, 49 
Closed sphere(s), 46 
center of, 46 

nested (or decreasing) sequence of, 

59 

radius of, 46 
Closure, 46, 79 
Closure operator, 46 
properties of, 46 
Codimension, 122 
Cohen, P. J., 29 
Compact space, 92 
countably, 95 
locally, 97 
Compactness, 92 
countable, 95 
relative, 97 
relative countable, 97 
Compactum, 92, 96 
metric, 96 

Complement of a set, 3 
Complete limit point, 97 
Complete measure, 280 
Completely continuous operators), 239 ff. 
basic properties of, 243-246 
in Hilbert space, 246-251 
Completely regular space, 92 


Completion (of a metric space), 62 
Component (of an open set), 55 
Conjugate space, 185 
of a normed linear space, 184 
second, 190 
strong topology in, 190 
third, 190 

weak topology in, 200 
weak* topology in, 202 
Connected set, 55 
Connected space, 84 
Contact point, 46, 79 
Continuity, 44, 87 
from the left, 315 
from the right, 315 
uniform, 109 
Continuous charge, 346 
Continuous linear functional(s), 175 ff. 
order of, 182 
sufficiently many, 181 
Continuum, 16 
power of, 16 

Contraction mapping(s), 66 ff. 
and differential equations, 71-72 
and integral equations, 74-76 
and systems of differential equations, 
72-74 

principle of, 66 

Convergence almost everywhere, 289 
Convergence in measure, 292 
Convergence in the mean, 379 
Convergence in the mean square, 385 
Convergent sequence: 
in a metric space, 47 
in a topological space, 84 
Convex body, 129 
Convex functional, 130, 134 
Convex hull, 130 
Convex set, 129 
Convexity, 128 

Countability of rational numbers, 11 
Countable additivity, 266, 272 
Countable base, 382 
Countable set, 10 
Countably compact space, 95 
Countably Hilbert space, 173 
Countably normed (linear) space, 171 
complete, 173 
Cover, 83 
closed, 83 
open, 83 

Covering (see Cover) 


Curve(s): 

in a metric space, 112-113 
length of, 114, 115 
sequence of, 115 
rectifiable, 332 

D 

Decomposition of a set into classes, 6-9 
S-algebra, 35 
8-ring, 35 

Delta function, 124, 208 
Dense set, 48 
everywhere, 48 
nowhere, 48, 61 
Density, 350 
Derived numbers, 318 
left-hand lower, 318 
right-hand upper, 318 
Diameter of a set, 65 
Difference between sets, 3 
Differentiation: 

of a monotonic function, 318-323 
of an integral with respect to its upper 
limit, 323-326 
Dimension, 121 
algebraic, 128 
Dini’s theorem, 115 
Direct product, 238, 352 
of measures, 354 
Directed set, 29 

Dirichlet function, 289, 291, 301 
Discontinuity point of the first kind, 315 
Discrete charge, 347 
Discrete space, 38 
Disjoint sets, 2 
pairwise, 2 
Distance: 

between a point and a set, 54 
between two sets, 55 
properties of, 37 
symmetry of, 37 

Domain (of definition), 4, 5, 221 
Domain (open connected set), 71 

E 

Egorov’s theorem, 290 
Eigenvalue, 235 
Eigenvector, 235 
Elementary set, 255 
measure of, 256 


Empty set, 2 
e-neighborhood, 46 
e-net, 98 

Equicontinuous family of functions, 102 
Equivalence classes, 8 
Equivalence relation, 7 
Equivalent functions, 288 
Equivalent sets, 13 
Essential supremum, 311 
Essentially bounded function, 310 
Euclidean n-space, 38, 144 
Euclidean space(s), 138, 142 ff. 
characterization of, 160 
complete, 153 
norm of vector in, 164 
orthogonal elements of, 164 
components of elements of, 149 
norm in, 142 
separable, 146 
Euler lines, 105 

Exhaustive sequence of sets, 308 
Extension of a functional, 132 
Extension of a measure, 271, 277, 279 
Jordan, 281 

F 

Factor space, 122 
Fatou’s theorem, 307 
Field, 37 

Finite expansion, 33 
Finite function, 208 
Finite set, 10 

First axiom of countability, 83 
First axiom of separation, 85 
Fixed point, 66 
Fixed point theorem, 66 
Fourier coefficients, 149, 152, 165 
Fourier series, 149, 165 
Fractional part, 8 
Fraenkel, A. A., 25, 27 
Fredholm equation, 74 
homogeneous, 74 
kernel of, 74 
nonhomogeneous, 74 
Friedman, A., 212 
Fubini’s theorem, 359 
Function space, 39, 108 
Functional (s), 108, 123 
addition of, 183 
additive, 123 

bounded linear (see Bounded linear 
functional) 
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Functional(s) ( cont .): 
conjugate-homogeneous, 123 
conjugate-linear, 124 
continuous, 175 

continuous linear (see Continuous linear 
functionals) 
convex, 130, 134 
extension of, 132 
homogeneous, 123 
linear, 124, 175 ff. 

Minkowski, 131 
null space of, 125 
product of, with a number, 183 
separation of sets by, 136 
Function(s), 4 ff. 
absolutely continuous, 336 
Borel-measurable, 284 
bounded (real), 110, 207 
Cantor, 335 
characteristic, 349 
continuous, 44, 79 
from the left, 315 
from the right, 315 
uniformly, 109 
delta, 124, 208 

domain (of definition of), 4, 5 
equivalent, 288 
essentially bounded, 310 
finite, 207 
general, 5 

generalized (see Generalized functions) 
generating, 362 
infinitely differentiable, 169 
integrable, 294, 296, 308 
locally, 208 
inverse, 5 
jump, 315, 341 
jump of, 315 
left-hand limit of, 315 
lower limit of, 111 
lower semicontinuous, 110 
measurable, 284 ff. 
monotonic, 314 
nondecreasing, 314 
nonincreasing, 314 
of bounded variation, 328-332 
one-to-one, 5 
oscillation of, 111 
range of, 4, 5 
real, 4 

right-hand limit of, 315 
simple, 286 


Function(s) (cont.)\ 
singular, 341 
step, 316 

summable, 294, 296, 308 
test, 208 

uniformly continuous, 109 
upper limit of, 111 
upper semicontinuous, 110 
Fundamental functions (see Test functions) 
Fundamental parallelepiped, 98 
Fundamental sequence (see Cauchy se¬ 
quence) 

Fundamental space (see Test space) 

G 

General measure theory, 269 ff. 

Generalized function(s), 124, 206 ff. 
and differential equations, 211-214 
complex, 215 
convergence of, 209 
definition of, 208 
derivative of, 210 
of several variables, 214-215 
on the circle, 216 
operations on, 209-210 
product of, with a number, 209 
product of, with an infinitely differenti¬ 
able function, 210 
regular, 208 
singular, 208 
sum of, 209 
Godel, K., 209 
Graph, 238 

Greatest lower bound (in a partially ordered 
set), 30 

Gurevich, B. L., 350, 351 

H 

Hahn decomposition, 345 
Hahn-Banach theorem, 132, 180 
complex version of, 134, 181 
Hamel basis, 128 

Hausdorff axiom of separation, 85 
Hausdorff space, 85 
Hausdorff’s maximal principle, 28 
Heine-Borel theorem, 92 
Helly’s convergence theorem, 370 
Helly’s selection principle, 372 
Hereditary property, 87 
Hilbert, D., 155 


Hilbert cube, 98 
Hilbert space(s), 155 ff. 
complex, 165 
countably, 173 
isomorphic, 155, 165 
linear manifold in, 156 
closed, 156 
subspace(s) of, 156 
direct sum of orthogonal, 159 
(mutually) orthogonal, 158 
orthogonal complement of, 157 
Hilbert-Schmidt theorem, 248 
Holder’s inequality, 41 
homogeneity of, 42 
Holder’s integral inequality, 45 
Homeomorphic mapping, 44, 89 
Homeomorphic spaces, 44, 89 
Homeomorphism, 44, 89 
Hyperplane, 127 

I 

Ideal, two-sided, 252 
Image: 

of an element, 5 
of a set, 5 
Infimum, 51 
Infinite set, 10 
Initial section, 25 
Inner measure, 258, 276 
Integrable function, 294, 296, 308 
Integral part, 8 
Interior, 128 
Interior point, 50 
Intersection of sets, 2 
Into mapping, 5 
Invariant subspace, 238 
Inverse function, 5 
Invisible point: 
from the left, 319 
from the right, 319 
Isolated point, 47 
Isometry, 44 

Isomorphism, 21, 120, 155, 165 
conjugate-linear, 194, 234 
Isomorphism theorem, 155, 165 

J 

Jordan decomposition, 346 
Jordan extension, 281 


Jordan measurable set, 281 
Jordan measure, 281 
Jump, 315 

Jump function, 315, 341 

K 

Kelley, J. L., 87, 90, 92, 97 
Kernel, 74 

L 

Lattice, 30 

Least upper bound (in a partially ordered 
set), 30 

Lebesgue decomposition, 341, 351, 363 
Lebesgue extension, 277, 279 
Lebesgue integral, 293 ff. 
absolute continuity of, 300-301 
as a set function, 343-351 
indefinite, 313 ff. 

of a general measurable function, 296, 
308 

of a simple function, 294 
over a set of infinite measure, 308 
vs. Riemann integral, 293-294, 309-310 
Lebesgue-integrable function (see Inte¬ 
grable function) 

Lebesgue-Stieltjes integral, 364 
vs. Riemann-Stieltjes integral, 368 
Lebesgue's bounded convergence theorem, 
303 

Lebesgue’s theorem: 

on differentiation of a monotonic func¬ 
tion, 321 

on integration of the derivative of an 
absolutely continuous function, 340 
Left-hand limit, 315 
Levi’s theorem, 305 
Limit of a sequence: 
in a metric space, 47 
in a topological space, 84 
Limit point, 47, 79 
complete, 97 
Linear closure, 140 
Linear combination, 120 
Linear dependence, 120 
Linear functional, 175 ff. 
bounded (see Bounded linear func¬ 
tional) 

continuous .(see Continuous linear func¬ 
tionals) 
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Linear hull, 122 
Linear independence, 121 
Linear manifold, 140, 156 
Linear operator, 221 
bounded, 223 
norm of, 224 
spectral radius of, 239 
closed, 237 

completely continuous (see Completely 
continuous operators) 
graph of, 238 
Linear space (s), 118 ff. 
basis in, 121 
Hamel, 128 
closed segment in, 128 
complex, 119 
countably normed, 171 
dimension of, 121 
algebraic, 128 
finite-dimensional, 121 
functionals on (see Functionals) 
infinite-dimensional, 121 
isomorphic, 120 

linearly dependent elements of, 120 
linearly independent elements of, 121 
K-dimensional, 121 
normed (see Normed linear spaces) 
open segment in, 128 
real, 119 
subspace, 121 
proper, 121 

topological (see Topological linear space) 
Linearly ordered set (see Ordered set) 
Lipschitz condition, 55 
Locally integrable function, 208 
Lower limit. 111 

Lower semicontinuous function, 110 
Luzin’s theorem, 293 

M 

Mapping, 5 ff. 
continuous, 44, 87 
contraction, 66 
fixed point of, 66 
into, 5 
natural, 191 
one-to-one, 5 
onto, 5 

order-preserving, 21 
Mathematical expectation, 366 
Mathematical induction, 28 


Mean square deviation, 384 
Mean (value), 366 
Measurable function, 284 ff. 

integration of, 294, 296, 308 
Measurable set(s), 259 ff, 267 
decreasing sequence of, 266 
increasing sequence of, 267 
Jordan, 281 
Measure(s), 254 ff. 
additivity of, 255, 263 
complete, 280 
continuity of, 267 
countably (a-) additive, 266, 272 
direct product of, 354 
extension(s) of, 271, 275-283 
inner, 258, 276 
Jordan, 281 

Lebesgue, 259, 276, 279 
of an elementary set, 256 
of a plane set, 259, 276 
of a rectangle, 255 
on a semiring, 270 
outer, 258, 276 
product, 354 
cs-finite, 308 
signed, 344 

Stieltjes (see Stieltjes measure) 
with a countable base, 382 
Measure space, 294 

Method of successive approximations, 66, 
67 

Metric (see Distance) 

Metric space(s), 37 ff. 
complete, 56 
completion of, 62 
continuous curves in, 112-113 
length of, 114, 115 
sequence of, 115 
continuous mapping of, 44 
convergence in, 47 
incomplete, 56 
isometric, 44 
isometric mapping of, 44 
real functions on, 108 
equivalent continuous, 113 
uniformly continuous, 109 
relatively compact subsets of, 101 
separable, 48 
subspace of, 43 
total boundedness of, 97-99 
compactness and, 99-101 
Metrizable space, 90 


Minkowski functional, 131 
Minkowski’s inequality, 41 
Minkowski’s integral inequality, 45 
Monotonic function, 314 

N 

n-dimensional simplex, 137 
^-dimensional face of, 137 
vertices of, 137 

n-dimensional (vector) space, 119 
Negative set, 344 
Neighborhood, 46, 79 
Neighborhood base, 83 
at zero, 168 

Nested sphere theorem, 60 
Noncomparable elements, 21 
Nondecreasing function, 314 
Nonincreasing function, 314 
Nonmeasurable set, 268 
Normal space, 86 
Normed linear space(s), 138 
bounded subset of, 141 
complete, 140 
complete set in, 140 
conjugate space of, 184 
direct product of, 238 
subspaces of, 140 
Norm(s), 138, 142, 163 
compatible, 171 
comparable, 172 
equivalent, 141, 172 
of a bounded linear functional, 177 
of a bounded linear operator, 224 
properties of, 138 
stronger, 172 
weaker, 172 
n-space, 119 
Null space, 125 


o 

One-to-one correspondence, 5, 10, 13 
One-to-one function, 5 
Onto mapping, 5 
Open ball (see Open sphere) 

Open set(s), 50 
component of, 55 
in a topological space, 78 
on the real line, 51 
unions and intersections of, 50 


Open sphere, 45 
center of, 46 
radius of, 46 
Operator(s), 221 ff. 
adjoint, 232 
in Hilbert space, 234 
continuous, 221 
degenerate, 240 
domain (of definition) of, 221 
eigenvalue of, 235 
eigenvector of, 235 
identity (or unit), 222 
inverse, 228 
invertible, 228 
linear (see Linear operator) 
product of, 225 
with a number, 225 
projection, 223 
resolvent of, 236 
self-adjoint, 235 
spectrum of, 235 
sum of, 225' 
zero, 222 

Order type (see Type) 

Ordered product, 23 
Ordered set, 21 
Ordered sum, 22 
Order-preserving mapping, 21 
Ordinal, 24 
transfinite, 24 
Ordinal number(s), 24 
comparison of, 25 
Orthogonal basis, 143 
Orthogonal complement, 157 
Orthogonal system, 143 
complete, 143 
Orthogonal vectors, 143 
Orthogonalization, 148 
Orthogonalization theorem, 147 
Orthonormal basis, 143 
Orthonormal system, 143 
closed, 151 
complete, 143 
vs. closed, 151 
Oscillation, 111 
Oqter measure, 258, 276 

P 

Parseval’s theorem, 151 
Partial ordering, 20 
Partially ordered set(s), 20 







400 INDEX 


INDEX 401 


Partially ordered set(s) ( cont .): 
isomorphic, 21 
maximal element of, 21 
minimal element of, 21 
noncomparable elements of, 21 
Partition of a set into classes, 6-9 
Peano’s theorem, 104 
Petrovski, I. G., 76 
Picard’s theorem, 71 
Polygonal line, 55 
Positive set, 344 
Power: 
of a set, 16 
of the continuum, 16 
Preimage: 
of a set, 5 
of an element, 5 

Principle of contraction mapping, 66 
Probability density, 367 
Product measure, 354, 356 
evaluation of, 356-359 
Projection operator, 223 
Proper subspace, 121 

Q 

Quotient space (see Factor space) 

R 

Radon-Nikodym derivative, 350 
Radon-Nikodym theorem, 347 
Random variable, 366 
continuous, 366 
discrete, 366 

mathematical expectation of, 366 
mean (value) of, 366 
probability density of, 367 
variance of, 366 
Range. 4, 5 
Rectangle, 255 
closed, 255 
half-open, 255 
measure of, 255 
open, 255 

Rectifiable curve, 332 
Reflexive space, 191 
Reflexivity, 7 
Relation, 7 
antisymmetric, 7 
binary, 7 
equivalence, 7 


Relation (cont.): 
reflexive, 7 
symmetric, 7 
transitive, 7 

Relatively compact subset, 97 
Relatively countably compact subset, 97 
Residue class, 122 
Resolvent, 236 
Riemann integral, 293 
vs. Lebesgue integral, 293-294, 309-310 
Riemann-Stieltjes integral, 367 
vs. Lebesgue-Stieltjes integral, 368 
Riesz lemma, 319 
Riesz representation theorem, 374 
Riesz-Fischer theorem, 153 
Right-hand limit, 315 
Ring of sets, 31 

minimal, generated by a semiring, 34 
minimal, generated by a system of sets, 32 
Rozanov, Y. A., 366 

S 

Scalar product, 142 
complex, 163 
Schwartz, L., 212 
Schwarz’s inequality, 40, 142 
Second axiom of countability, 82 
Second axiom of separation, 85 
Self-adjoint operator, 235 
Semireflexive space, 191 
Semiring of sets, 32 
finite expansion in, 33 
minimal ring generated by, 34 
Separable (metric) space, 48 
Set of a-uniqueness, 282 
Set of uniqueness, 282 
Set theory, 1-36 

naive vs. axiomatic, 29 
Set(s), 1 ff. 
algebra of, 31 
bounded, 65,141 
totally, 98 
Cantor, 52 
closed, 49 
closure of, 46 
complement of, 3 
connected, 55 
contact point of, 46 
convex, 129 
countable, 10 

curly bracket notation for, 1 


Set(s) (cont.): 
decomposition of, 6 
dense, 48 
everywhere, 48 
nowhere, 48, 61 
diameter of, 65 
difference between, 3 
direct product of, 352 
directed, 29 
disjoint, 2 
pairwise, 2 

duality principle for, 4 
elementary, 255 
elements of, 1 
empty, 2 
equivalent, 13 
exhaustive sequence of, 308 
finite, 10 
infinite, 10 
interior of, 128 
interior point of, 50 
intersection of, 2 
isolated point of, 47 
Jordan measurable, 281 
(Lebesgue) measurable, 259, 267, 276, 
279 

limit point of, 47 
complete, 97 

measure of, 259, 267, 276, 279 
negative, 344 
nonmeasurable, 268 
of uniqueness, 282 
of a-uniqueness, 282 
open, 50 

operations on, 2 ff. 
ordered, 21 
partially ordered, 20 
partition of, 6 
positive, 344 
power of, 16 
ring of, 31 
semiring of, 32 
subset of, 1 
proper, 2 
sum of, 2 
symmetric, 171 
symmetric difference of 3, 4 
systems of, 31-36 
totally bounded, 98 
uncountable, 10 
union of, 2 
well-ordered, 23 


Shilov, G. E., 147, 155, 245, 350, 351 
a-additivity (see Countable additivity) 
a-algebra, 35 
a-finite measure, 308 
a-ring, 35 

Signed measure, 344 

Silverman, R. A., 76, 140, 147, 247, 350, 
366 

Simple function, 286 

Simplex (see n-dimensional simplex) 

Simply ordered set (see Ordered set) 
Singular charge, 347 
Singular function, 341 
Smirnov, V. I., 247 
Space: 
c, 120 
c„, 120 
C [0 . 6 ], 39, 57 
40, 59 
C”, 119 
C(I,R), 113 

of isolated points, 38, 56 
of rapidly decreasing sequences, 172 
/,. 39, 57 
4,43 
L u 378 
L 2 , 383 
m, 41,120 
**, 38, 56 
R n , 38, 57 
*°°, 120 
* 2,41 

Spectral radius, 239 
Spectrum, 235 
continuous, 236 
point, 236 

regular point of, 235 
Step function, 211, 316 
Stereographic projection, 14 
Stieltjes integral (see Lebesgue-Stieltjes 
integral) 

Stieltjes measure, 362, 364 
absolutely continuous, 363 
discrete, 363 

generating function of, 362 
singular, 363 
Strong convergence, 195 
Strong topology, 184 
in conjugate space, 190 
Subcover, 83 
Subset, 1 
proper, 2 
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Subspace, 121 
closed, 140 

generated by a set, 122 
invariant, 238 
proper, 121 

Successive approximations, method of, 66 
67 

Sum of sets, 2 

Summable function, 294, 296, 308 
complex, 387 
Supremum, 41, 51 
Symmetric difference, 3, 4 
Symmetric set, 171 
Symmetry, 7 
System of sets, 31 
centered, 92 
trace of, 80 
unit of, 31 

T 

Test functions, 208 
convergence of, 208 
Test space, 208, 216 
Tolstov, G. P„ 140, 145 
Topological linear space, 138, 167 ff. 
bounded subset of, 169 
continuous mapping of, 87 
functionals on, 175 
continuous, 175 
continuous linear, 175 ff. 
linear, 175 
locally bounded, 169 
locally convex, 169 
neighborhood base at zero of, 168 
normable, 169 
weak topology in, 195 
Topological space(s), 78 ff. 
base for, 81 
bicompact, 96 
closed sets of, 79 
compact, 92 
completely regular, 92 
connected, 84 
convergence in, 84 
countably compact, 95 
cover (covering) of, 83 
hereditary property of, 87 
locally compact, 97 
metrizable, 90 
normal, 86 
open sets of, 78 


Topological space(s) ( cont .): 
points of, 79 
real functions on, 108 
relatively compact subset of, 97 
relatively countable compact subset of, 97 
with a countable base, 82 
Topology, 78 

generated by a system of sets, 80 

relative, 80 

strong, 184, 190 

stronger, 80 

weak, 195, 200 

weak*, 202 

weaker, 80 

Total variation, 328, 346 
Totally bounded set, 98 
Transcendental number, 19 
Transfinite induction, 29 
Transfinite ordinal, 24 
Transitivity, 7 
Triangle inequality, 37, 138 
Tj-space, 85 
T^-space, 85 
Two-sided ideal, 252 
Tychonoff space, 92 
Type(s), 22 
ordered sum of, 23 
ordered product of, 23 
vs. power, 22 

U 

Uncountability of real numbers, 15 

Uncountable set, 10 

Uniform continuity, 109 

Uniformly bounded family of functions, 102 

Union of sets, 2 

Unit (of a system of sets), 31 

Ljpper bound (in a partially ordered set), 28 

Upper limit, 111 

Upper semicontinuous function, 110 

Urysohn’s lemma, 91 

Urysohn’s metrization theorem, 90 

V 

van der Waerden, B. L., 327 
Variance, 366 
Variation: 
bounded, 328 
negative, 346 
positive, 346 
total, 328, 346 
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Vector space (see Linear space) 
Volterra equation, 75 
Volterra operator, 243 


Weak convergence, 195 
of functionals, 200 
Weak* convergence, 202 
Weak topology, 195 
in conjugate space, 200 
Weak* topology, 202 
Weierstrass’ approximation theorem, 
145 


Well-ordered set, 23 
(initial) section of, 25 
order type of, 24 
remainder of, 25 
smallest element of, 23 
Well-ordering theorem, 27 


Z 


Zermelo, E., 27 
140, Zero element, 118 
Zorn’s lemma, 28 




